Check out what's in the latest release of Kabanero Learn More

Application Monitoring on OKD 3.11 with Prometheus and Grafana

duration 60 minutes

Introduction

The following guide has been tested with OKD 3.11/Kabanero 0.2.0.

For application monitoring on OKD (OpenShift Origin Community Distribution), you need to set up your own Prometheus and Grafana deployments. There are two approaches for setting up Prometheus on OKD.

  • Option A - The first approach is via Prometheus Operator and Service Monitor, which is the newest and the most popular way of setting up Prometheus on a Kubernetes cluster.

  • Option B - Use the legacy way of deploying Prometheus on OKD without the Prometheus Operator.

This guide explores both approaches to set up Prometheus on OKD.

Deploy a Sample Application with MP Metrics Endpoint

Prior to deploying Prometheus, ensure that there is a running application that has a service endpoint for outputting metrics in Prometheus format.

It is assumed such a running application has been deployed to the OKD cluster inside a project/namespace called myapp, and that the Prometheus metrics endpoint is exposed on path /metrics.

Option A: Deploy Prometheus - Prometheus Operator

service_monitor.yaml

 1apiVersion: monitoring.coreos.com/v1
 2kind: ServiceMonitor
 3metadata:
 4  labels:
 5    k8s-app: myapp-monitor
 6  name: myapp-monitor
 7  namespace: myapp
 8spec:
 9  endpoints:
10    - interval: 30s
11      path: /metrics
12      port: 9080-tcp
13  namespaceSelector:
14    matchNames:
15      - myapp
16  selector:
17    matchLabels:
18      app: myapp

prometheus.yaml

 1apiVersion: v1
 2kind: ServiceAccount
 3metadata:
 4  name: prometheus
 5---
 6apiVersion: rbac.authorization.k8s.io/v1beta1
 7kind: ClusterRole
 8metadata:
 9  name: prometheus
10rules:
11- apiGroups: [""]
12  resources:
13  - nodes
14  - services
15  - endpoints
16  - pods
17  verbs: ["get", "list", "watch"]
18- apiGroups: [""]
19  resources:
20  - configmaps
21  verbs: ["get"]
22- nonResourceURLs: ["/metrics"]
23  verbs: ["get"]
24---
25apiVersion: rbac.authorization.k8s.io/v1beta1
26kind: ClusterRoleBinding
27metadata:
28  name: prometheus
29roleRef:
30  apiGroup: rbac.authorization.k8s.io
31  kind: ClusterRole
32  name: prometheus
33subjects:
34- kind: ServiceAccount
35  name: prometheus
36  namespace: prometheus-operator
37---
38apiVersion: monitoring.coreos.com/v1
39kind: Prometheus
40metadata:
41  name: prometheus
42  namespace: prometheus-operator
43spec:
44  serviceAccountName: prometheus
45  serviceMonitorNamespaceSelector:
46    matchLabels:
47      prometheus: monitoring
48  serviceMonitorSelector:
49    matchExpressions:
50      - key: k8s-app
51        operator: Exists
52  resources:
53    requests:
54      memory: 400Mi
55  enableAdminAPI: false
56---
57apiVersion: v1
58kind: Service
59metadata:
60  name: prometheus
61  namespace: prometheus-operator
62spec:
63  type: NodePort
64  ports:
65  - name: web
66    port: 9090
67    protocol: TCP
68    targetPort: web
69  selector:
70    prometheus: prometheus

prometheus_snippet.yaml

1serviceMonitorNamespaceSelector:
2  matchLabels:
3    prometheus: monitoring
4serviceMonitorSelector:
5  matchExpressions:
6    - key: k8s-app
7      operator: Exists

The Prometheus Operator is an open-source project originating from CoreOS and exists as as part of their Kubernetes Operator offering. The Kubernetes Operator framework is becoming the standard for Prometheus deployments on a Kubernetes system. When the Prometheus Operator is installed on the Kubernetes system, you no longer need to hand-configure the Prometheus configuration. Instead, you create ServiceMonitor resources for each of the service endpoints that needs to be monitored: this makes daily maintainenance of the Prometheus server a lot easier. An architecture overview of the Prometheus Operator is shown below:

Prometheus Operator

There are two ways to install the Prometheus Operator:

  1. One is through Operator Lifecycle Manager or OLM, (which is still in its technology preview phase in release 3.11 of OKD).

  2. Another approach is to install Prometheus Operator by following the guide from the Prometheus Operator git repository.

Since OLM is still at its technical preview stage, this guide shows the installation without OLM. The guide will be updated with the OLM approach when Kabanero officially adopts OKD 4.x.

Prometheus Operator Installation

The following procedure is based on the Prometheus Getting Started guide maintained by the CoreOS team, with the added inclusion of OpenShift commands needed to complete each step.

  1. Clone the Prometheus Operator repository

    git clone https://github.com/coreos/prometheus-operator
  2. Create a new namespace for our Prometheus Operator deployment.

    oc new-project prometheus-operator
  3. Open the bundle.yaml file and change all instances of namespace: default to the the newly created namespace namespace: prometheus-operator

  4. Add the line - --deny-namespaces=openshift-monitoring to the existing containers args section of Prometheus Operator’s Deployment definition in the bundle.yaml file. The --deny-namespaces argument allows the exclusion of certain namespaces watched by the Prometheus Operator. By default, Prometheus Operator oversees Prometheus deployments across all namespaces. This could be problematic if there are multiple Prometheus Operator deployments on the OKD cluster. For instance, the OKD’s Cluster Monitoring feature also deploys a Prometheus Operator in namespace openshift-monitoring. Therefore, openshift-monitoring namespace should be excluded by our Prometheus Operator to prevent undesired behaviors.

  5. Save the bundle.yaml file and deploy the Prometheus Operator using the following command.

    oc apply -f bundle.yaml

    You may receive an error message like the one below when running the command.

    Error creating: pods "prometheus-operator-5b8bfd696-" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000070000, 1000079999]]

    To correct the error, change the runAsUser: 65534 field in the bundle.yaml file to a valid value that is in the range specified in the error message. In this case, setting runAsUser: 1000070000 in the bundle.yaml would be in the valid range. Save the bundle.yaml file and re-deploy the Prometheus Operator.

    oc delete -f bundle.yaml
    oc apply -f bundle.yaml

    The service_monitor.yaml file defines a ServiceMonitor resource. A ServiceMonitor defines a service endpoint that needs to be monitored by the Prometheus instance. Take for example, an application with label app: myapp from namespace myapp, and metrics endpoints defined in spec.endpoints to be monitored by the Promtheus Operator. If the metrics endpoint is secured, you can define a secured endpoint with authentication configuration by following the endpoint API documentation of Prometheus Operator.

    Create the service_monitor.yaml file
  6. Apply the service_monitor.yaml file to create the ServiceMonitor resource.

    oc apply -f service_monitor.yaml
  7. Define a Prometheus resource that can scrape the targets defined in the ServiceMonitor resource. Create a prometheus.yaml file that aggregates all the files from the git repository directory prometheus-operator/example/rbac/prometheus/. NOTE: Make sure to change the namespace: default to namespace: prometheus-operator.

    Create the prometheus.yaml file
  8. Apply the prometheus.yaml file to deploy the Prometheus service. After all the resources are created, apply the Prometheus Operator bundle.yaml file again.

    oc apply -f prometheus.yaml
    oc apply -f bundle.yaml
  9. Verify that the Prometheus services have successfully started. The prometheus-operated service is created automatically by the prometheus-operator, and is used for registering all deployed Prometheus instances.

    oc get svc -n prometheus-operator
    NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    prometheus            NodePort    172.30.112.199   <none>        9090:30342/TCP   19h
    prometheus-operated   ClusterIP   None             <none>        9090/TCP         19h
    prometheus-operator   ClusterIP   None             <none>        8080/TCP         21h
  10. Expose the prometheus-operated service to use the Prometheus console externally.

    [root@rhel7-okd]# oc expose svc/prometheus-operated -n prometheus-operator
    [root@rhel7-okd]# oc get route -n prometheus-operator
    NAME         HOST/PORT                                                 PATH      SERVICES     PORT      TERMINATION   WILDCARD
    prometheus   prometheus-prometheus-operator.apps.9.37.135.153.nip.io             prometheus   web                     None
  11. Visit the prometheus route and go to the Prometheus targets page. At this point, the page should be empty with no endpoints being discovered.

  12. Look at the prometheus.yaml file and update the serviceMonitorNamespaceSelector and serviceMonitorSelector fields. The ServiceMonitor needs to satisfy the matching requirement for both selectors before it can be picked up by the Prometheus service, like in the prometheus_snippet.yaml file. In this case,our ServiceMonitor has the k8s-app label, but the target namespace "myapp" is missing the required prometheus: monitoring label.

    Update prometheus.yaml to reflect the prometheus_snippet.yaml file
  13. Add the label to the "myapp" namespace.

    [root@rhel7-okd]# oc label namespace myapp prometheus=monitoring
  14. Check to see that the Prometheus targets page is picking up the target endpoints. If the service endpoint is discovered, but Prometheus is reporting a DOWN status, you need to make the prometheus-operator project globally accessible.

    oc adm pod-network make-projects-global prometheus-operator

Option B: Deploy Prometheus - Legacy deployments

scrape_configs.yaml

 1# Scrape config for API servers.
 2#
 3# Kubernetes exposes API servers as endpoints to the default/kubernetes
 4# service so this uses `endpoints` role and uses relabelling to only keep
 5# the endpoints associated with the default/kubernetes service using the
 6# default named port `https`. This works for single API server deployments as
 7# well as HA API server deployments.
 8- job_name: 'kubernetes-pods'
 9
10  kubernetes_sd_configs:
11  - role: pod
12
13  relabel_configs:
14  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
15    action: keep
16    regex: true
17  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
18    action: replace
19    target_label: __metrics_path__
20    regex: (.+)
21  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
22    action: replace
23    regex: ([^:]+)(?::\d+)?;(\d+)
24    replacement: $1:$2
25    target_label: __address__
26  - action: labelmap
27    regex: __meta_kubernetes_pod_label_(.+)
28  - source_labels: [__meta_kubernetes_namespace]
29    action: replace
30    target_label: kubernetes_namespace
31  - source_labels: [__meta_kubernetes_pod_name]
32    action: replace
33    target_label: kubernetes_pod_name

scrape_configs_snippet.yaml

1prometheus.io/path: /metrics
2prometheus.io/port: '9080'
3prometheus.io/scrape: 'true'

For users who just migrated their applications to OKD and define their own Prometheus configuration file, using the Prometheus Operator is not the only option for Prometheus deployments. You can deploy Prometheus by using the example yaml file provided by the OKD GitHub repository.

oc new-project prometheus
  1. Deploy the Prometheus using the sample prometheus.yaml file from here

    oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/prometheus/prometheus.yaml -p NAMESPACE=prometheus
  2. Edit the "prometheus" ConfigMap resource from the prometheus namespace.

     oc edit configmap/prometheus -n prometheus
  3. Remove all existing jobs and add the following scrap_configs job.

  4. Kill the existing Prometheus pod, or better yet, reload the Prometheus service gracefully using the command below for the new configuration to take effect.

    oc exec prometheus-0 -c prometheus -- curl -X POST http://localhost:9090/-/reload
    Make sure the monitored application's pods are started with the following annotations as specified in the prometheus ConfigMap's scrape_configs.
  5. Verify the scrape target is up and available in Prometheus by using Prometheus’s web console as follows: Click Console → Status → Targets.

    If the service endpoint is discovered, but Prometheus is reporting a DOWN status, you need to make the prometheus project globally accessible.

    oc adm pod-network make-projects-global prometheus

Deploy Grafana

grafana-datasources.yaml

 1apiVersion: v1
 2data:
 3  datasources.yaml: |-
 4    apiVersion: 1
 5    datasources:
 6      - name: "OCP Prometheus"
 7        type: prometheus
 8        access: proxy
 9        url: http://prometheus-operated-monitoring.apps.9.37.135.153.nip.io
10        basicAuth: false
11        withCredentials: false
12        isDefault: true
13        jsonData:
14            tlsSkipVerify: true
15            "httpHeaderName1": "Authorization"
16        secureJsonData:
17            "httpHeaderValue1": "Bearer \[grafana-ocp token\]"
18kind: ConfigMap
19metadata:
20  name: grafana-datasources
21  namespace: grafana

Regardless of which approach was used to deploy Prometheus on OKD, use Grafana dashboards to visualize the metrics. Use the sample grafana.yaml file provided by the OKD GitHub repository to install Grafana. NOTE: Perform the following steps to ensure that Prometheus endpoints are reachable as a data source in Grafana.

  1. Create a new project called grafana.

    oc new-project grafana
  2. Deploy Grafana using the grafana.yaml file from the OKD GitHub repository.

    oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/grafana/grafana.yaml -p NAMESPACE=grafana
  3. Grant the grafana service account view access to the prometheus namespace

    oc policy add-role-to-user view system:serviceaccount:grafana:grafana -n prometheus
  4. For Grafana to add existing Prometheus datasources in OKD, define the datasources in a ConfigMap resource under the grafana namespace. Create a ConfigMap yaml file called grafana-datasources.yaml.

  5. Apply the grafana-datasources.yaml file to create the ConfigMap resource.

    oc apply -f grafana-datasources.yaml
  6. Acquire the [grafana-ocp token] by using the following command.

    oc sa get-token grafana
  7. Add the ConfigMap resource to the Grafana application and mount it to /usr/share/grafana/datasources.

    ConfigMap mount path UI
  8. Save and test the data source. You should see 'Datasource is working'.

    Grafana data source UI

You can now consume all the application metrics gathered by Prometheus on the Grafana dashboard.

Copy file contents
Git clone this repo to get going right away:
git clone https://github.com/Kabanero-io/guide-app-monitoring.git
Copy github clone command

Way to go! What's next?

What could make this guide better?

Raise an issue to share feedback

Edit or create new guides to help contribute

Need help?

Ask a question on Stack Overflow

Where to next?