ZenPacks

Kubernetes ZenPack

This ZenPack provides system monitoring of Kubernetes (K8s) clusters.

Support

This ZenPack is included with commercial versions of Zenoss and enterprise support for this ZenPack is provided to Zenoss customers with an active subscription.

Releases

Version 1.0.1-Download
Released on October 25, 2018
Requires PythonCollector ZenPack, ZenPackLib ZenPack
Compatible with Zenoss Resource Manager 6.2 and Zenoss Cloud

Contents

Background

This ZenPack monitors Kubernetes (K8s) clusters in Google Cloud Platform (GCP) and locally-hosted environments. It uses RBAC authentication to access all data related to modeling and monitoring.

Support Requirements

Zenoss:

  • Zenoss 6.2+
  • ZenPackLib ZenPack 2.1.0+

Kubernetes:

  • Kubernetes versions 1.9.X - 1.10.X
  • Kubernetes API v1 and Metrics API metrics.k8s.io
  • Google Cloud Platform (GCP) and locally hosted Kubernetes

Features

ZenPack features include:

  • Overall Cluster Health Monitoring
  • Health Monitoring for Nodes, Services, Pods
  • Graphs for Kubernetes Cluster, Nodes, Pods, Containers
  • Dashboard Portlets for Pod CPU and Memory consumption
  • Service Impact and root cause analysis.
  • Event Management

Kubernetes Structure and Discovery

Objects are automatically discovered via the Kubernetes API. The ZenPack class structure can be visualized in the diagram on the right:

Device (Cluster)

  • Description: The device represents a single Kubernetes cluster.
  • Attributes:
    • buildDate
    • cluster_ip
    • cpu_capacity
    • cpu_usage
    • gcp_cluster
    • memory_capacity
    • memory_usage
    • platform
    • version
  • Relationships:
    • k8sNamespace
    • k8sNode
  • Datasource/Datapoints:
    • event
    • metrics
      • cpu
      • memory
  • Graphs:
    • CPU Utilization
    • Memory Utilization
  • Capacity Thresholds:
    • CPU Capacity
    • Memory Capacity

Namespace

  • Description: Namespaces for Kubernetes
  • Attributes:
    • container_count
    • namespace_uid
    • status
  • Relationships:
    • k8sService
    • k8sPod
    • k8sPersistentVolume

Node

  • Description: Compute nodes that Kubernetes is build from
  • Attributes:
    • architecture
    • cpu_allocatable
    • cpu_capacity
    • cpu_usage
    • ephemeral_storage_allocatable
    • ephemeral_storage_capacity
    • externalIP
    • guest_device
    • internalIP
    • kubeletVersion
    • manageIP
    • memory_allocatable
    • memory_capacity
    • memory_usage
    • modeled_cpu_allocatable
    • modeled_cpu_capacity
    • modeled_memory_allocatable
    • modeled_memory_capacity
    • node_hostname
    • node_type
    • node_uid
    • operatingSystem
    • pods_allocatable
    • pods_capacity
    • region
    • status
  • Relationships:
    • k8sCluster
    • k8sPod
  • Datasource/Datapoints:
    • status
      • status
    • metrics
      • cpu
      • memory
    • allocatable
      • cpu
      • memory
    • capacity
      • cpu
      • memory
  • Graphs:
    • CPU Utilization
    • Memory Utilization
  • Thresholds:
    • High Memory (default: disabled)
    • High CPU Load (default: disabled)

Persistent Volume

  • Description: Storage volume abstraction
  • Attributes:
    • capacity
    • pvc_uid
    • status
    • storageClassName
  • Relationships:
    • k8sNamespace
  • Datasource/Datapoints:
    • status:
      • status

Service

  • Description: Kubernetes Services represent virtual services that are realized by Pods and Containers.
  • Attributes:
    • cluster_ip
    • container_count
    • port_list
    • selector
    • service_type
    • service_uid
  • Relationships:
    • k8sNamespace
    • k8sPods

Pod

  • Description: A group of one or more containers with shared storage/network, and a specification for how to run the containers.
  • Attributes:
    • labels
    • pod_uid
    • status
  • Relationships:
    • k8sNamespace
    • k8sNode
    • k8sContainers
  • Datasource/Datapoints:
    • metrics:
      • cpu
      • memory
    • status:
      • status
  • Graphs:
    • CPU Usage
    • Memory Usage

Container

  • Description: Lowest compute abstraction element for Pods
  • Attributes:
    • cpu_limits
    • cpu_requests
    • image
    • labels
    • memory_limits
    • memory_requests
  • Relationships:
    • k8sPod
  • Datasource/Datapoints:
    • metrics:
      • cpu
      • memory
  • Graphs:
    • CPU Usage
    • Memory Usage
    • Note: It is common for some containers to have only partial data for cpu/memory so it is natural that some of those graphs will be missing data.
  • Thresholds:
    • High CPU Load
    • High Memory

Dashboard Portlets

This ZenPack adds portlets that provide at-a-glance views into Pod and Cluster memory and CPU utilization. Portlets are viewed on the first page upon login, and can be added or removed using the dashboard and portlet controls.

Kubernetes Portlets

The following are portlets specific to Kubernetes:

  • Top K8s Pods by Memory
  • Top K8s Pods CPU

These two portlets can be filtered by:

  • Cluster
  • Namespace
  • Service

Platform Portlets

In addition to Memory and CPU, the following platform portlets support Kubernetes events and issues:

  • Device Issues
  • Event View
  • Open Events
  • Open Events Chart

Usage

RBAC Authentication

You must expose the Kubernetes V2 and metrics.k8s.io APIs on your system. We exclusively use Role-based access control (RBAC) for cluster API access. See https://kubernetes.io/docs/reference/access-authn-authz/rbac/ for more information about RBAC authentication.

You generally must do at least the following steps for both GCP and locally installed Kubernetes systems:

  1. Set MY_PREFIX and capture ACCOUNT_ID and API_SERVER:

    MY_PREFIX=zenoss
    API_SERVER=$(kubectl cluster-info | head -1 | cut -d' ' -f6 | sed 's/\x1b\[[0-9;]*m//g')

    A. If using GCP, first ensure you are connected to the correct project associated with your cluster. Now find your ACCOUNT_ID:

        ACCOUNT_ID=$(gcloud info | grep Account | sed -r 's/Account: \[//;s/]//')

    B. If using locally-hosted Kubernetes, determine the ACCOUNT_ID as per https://kubernetes.io/docs/setup/scratch/

  2. Setup RBAC Authorization:

    kubectl create clusterrolebinding $MY_PREFIX-cluster-admin-binding --clusterrole=cluster-admin --user=$ACCOUNT_ID
  3. Grab the YAML from Appendix: Kubernetes RBAC Setup and save it to the file zenoss_rbac.yaml and use it to create the SA for the role:

    kubectl apply -f zenoss_rbac.yaml
  4. Get the secrete Token and save it (adjusting zenoss-secret if required):

    TOKEN=$(kubectl describe secret zenoss-secret | sed -n '/^token/p' | cut -d' ' -f7)
    echo $TOKEN
  5. $TOKEN will be set to the zKubernetesClusterToken in the token section

  6. From the Infrastructure Add pull-down select Add Kubernetes Cluster

  7. Fill in the following fields:

    • Device Name
    • IP of K8s API ($API_SERVER from above)
    • TCP Port of API
    • Service Account
    • Token for Service Account ($TOKEN from above)
  8. Select the correct Collector for your system

  9. Hit the Add button

Kubernetes Batch Configuration

You can also add your devices in batch for convenience and automation.

  • Attach to the Zope container:

        serviced service attach zope
  • Create a text file (filename: /tmp/batch.txt) and replace $TOKEN with your token from above:

        /Devices/Kubernetes
        kubernets101 zKubernetesClusterIP='10.20.30.40', \
           zKubernetesPort="443", \
           zKubernetesServiceAccount='zenoss', \
           zKubernetesClusterToken='$TOKEN'
  • Now run the zenbatchload command:

        zenbatchload  /tmp/batch.txt
  • The device should now load and model automatically

Adding a Custom Datasource to Metrics

In order to add a metrics datasource, you must be familiar with the API target you wish to call and the resulting JSON data response.

The metrics datasource provided requires three configuration parameters, which we describe below:

  1. api_target: The API target that gets appended to the metrics base API URL
  2. data_path: The path through the returned JSON that identifies the metric
  3. aggregator: Method to aggregate values returned by apt_target and data_path.

Together, the api_target and data_path provide the complete information for the datasource to acquire the requested data.
The aggregator provides the method to put that data together to form a single data value.

api_target

The api_target must be a valid path for the API. It must be in a plain REST GET format.

<string1>/<string2>/<string3>

where each <string*> must be a valid string defined in the API.
Examples:

api/v1/nodes
api/v1/pods
apis/metrics.k8s.io/v1beta1/nodes
apis/metrics.k8s.io/v1beta1/pods

These examples supply the entire API path, beyond that base URL, that is required. Consult the API documentation for further information: https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/

data_path

The data_path string represents a path through the returned JSON data that loosely follows the jq style format which separates path elements (dictionary keys) by dots.
It can include the following items:

  • Plain jq strings. For example: a.b
  • Strings with square brackets with a jq-style identifier:

    items[metadata.name]

    This example will scan all list elements in items and select the meta.name element from those items.
    To clarify, this will match all items that have the JSON key metadata with sub-key name.
    Note that this element is not useful on its own unless items[metadata.name] filters items and selects out only those which have metadata.name structure.

  • Strings with square brackets with a value-qualified jq-styled identifier. This allows you to filter list items that match a dictionary key or value.
    Examples:

    items[metadata.name=server7]
    items[metadata.name=server7].usage
    items[metadata.name=${here/title}].usage
    items[metadata.name=${here/title}].status.capacity

    Note that the last two examples show that you can use dynamic TALES expressions instead of static strings to filter the items elements by value.
    Also note that the last three examples specify the path to the metric that matches the item list elements in square brackets.

aggregator

The required aggregator is selected from the drop-down. Choose from:

  • AVERAGE: Average all elements
  • FIRST: Choose the first element only
  • MAX: Select the maximum value
  • MIN: Select the minimum value
  • PERCENT_AVERAGE: Return average of the data multiplied by 100
  • PERCENT_SUM: Return sum of the data multiplied by 100
  • SUM_OR_ZERO: Sum the data, return zero if no data exists
  • SUM: Sum all the data

Installed Items

Installing this ZenPack will add the following items to your Zenoss system:

Configuration and zProperties

The zProperties and default settings are as follows:

  • zKubernetesClusterIP: The IP address of the Kubernetes Cluster API
  • zKubernetesPort: The TCP port of the API (Default: 443)
  • zKubernetesServiceAccount: The Kubernetes service account associated with the API account. See kubectl get serviceaccounts for more information.
  • zKubernetesClusterToken: The token associated with zKubernetesServiceAccount. See kubectl describe secrets for more information.
  • zKubernetesGuestUseExternalIP: Boolean to set the manageIp to the external IP for host monitoring (Default: True)
  • zKubernetesEventInterval: Polling interval for events (Default: 60)
  • zKubernetesMonitoringInterval: Polling interval for metrics collection (Default: 300)
  • zKubernetesStatusInterval: Polling interval for status updates (Default: 300)
  • zKubernetesContainerNamesModeled: Pattern of Container names to model. Format: namespaces/pods/containers (Default ["kube-system/.*/.*"])
  • zKubernetesContainerLabelsModeled: Container labels to model. Format: key:value

Modeler Plugins

  • Kubernetes.Cluster

Service Impact and Root Cause Analysis

When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact and root cause analysis capabilities. The service impact relationships shown in the diagram (right) and described below are automatically added and maintained. These will be included in any services that contain one or more of the explicitly mentioned components.

The following objects types would typically be added to Impact services.

  • Kubernetes Containers
  • Linux device associated with a Kubernetes Node

Impact Relationships between Kubernetes Components

  • Cluster: impacts Node, Namespace, GCPcluster(external)
  • Namespace: impacts Service, Pod, PersistentVolume
  • Node: impacts Pod, GuestDevice(external)
  • Service: impacts Pod
  • Pod: impacts Container

Appendix: Kubernetes RBAC Setup

In order to properly enable the Core Metrics Service and provide RBAC access permissions to other components, the following YAML must be applied to the account in the following way:

kubectl apply -f zenoss_rbac.yaml

as references in Usage.

Save the following YAML as zenoss_rbac.yaml as references above. Make sure to preserve the proper YAML formatting:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: zenoss
  namespace: default
secrets:
- name: zenoss-secret
---
apiVersion: v1
kind: Secret
metadata:
  name: zenoss-secret
  annotations:
    kubernetes.io/service-account.name: zenoss
type: kubernetes.io/service-account-token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: zenoss-role
rules:
- apiGroups: [""]
  resources:
    - clusterrolebindings
    - componentstatus
    - configmaps
    - controllerrevisions
    - cronjobs
    - daemonsets
    - deployments
    - endpoints
    - events
    - horizontalpodautoscalers
    - ingress
    - jobs
    - limitranges
    - namespaces
    - networkpolicys
    - nodes
    - persistentvolumeclaims
    - persistentvolumes
    - pods
    - poddisruptionbudgets
    - podsecuritypolicys
    - podtemplates
    - replicasets
    - replicationcontroller rolebindings
    - secrets
    - services
    - serviceaccounts
    - statefulsets
    - status
    - storageclasses
    - nodemetrics
  verbs: ["get", "watch", "list"]
- apiGroups: ["metrics.k8s.io"]
  resources:
    - nodes
    - pods
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: zenoss-role-binding
roleRef:
  kind: ClusterRole
  name: zenoss-role
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: zenoss
  namespace: default

Appendix: Identifying Master Nodes

Identifying master nodes can sometimes fail. If you have issues with your nodes being identified as non-master, you can set a label on your node metadata as:

master: "true"

In GCP, this is edited in the UI:

    Kubernetes Engine -> Cluster -> Node -> YAML -> Edit

In kubectl, you can edit the node YAML directly:

    kubectl edit node ${NODE_NAME}

You should see end up with something like this:

apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: 2018-06-25T20:55:33Z
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/fluentd-ds-ready: "true"
    beta.kubernetes.io/instance-type: g1-small
    beta.kubernetes.io/os: linux
    cloud.google.com/gke-nodepool: default-pool
    failure-domain.beta.kubernetes.io/region: us-central1
    failure-domain.beta.kubernetes.io/zone: us-central1-a
    kubernetes.io/hostname: gke-cluster-1-default-pool-fc3e27a3-2mmx
    master: "true"
spec:
  ... etc ...

Changes

1.0.1

  • Fix install issue with Zenoss 6.2.0 (ZPS-4674)
  • Tested with Zenoss 6.2.1, Zenoss Cloud and Impact 5.3.1

1.0.0

  • Initial Release
  • Tested with Zenoss 6.2.1, Zenoss Cloud and Impact 5.3.1
Commercial

This ZenPack is developed and supported by Zenoss Inc. Commercial ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to request more information regarding this or any other ZenPacks. Click here to view all available Zenoss Commercial ZenPacks.