Skip to content
Snippets Groups Projects
Verified Commit f9d8b69a authored by David Hoese's avatar David Hoese
Browse files

Add initial prometheus rules and alerts

parent 9e8ac016
No related branches found
No related tags found
No related merge requests found
...@@ -61,9 +61,15 @@ https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx ...@@ -61,9 +61,15 @@ https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx
```bash ```bash
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install -n kube-system ingress-nginx ingress-nginx/ingress-nginx --set controller.metrics.enabled=true helm install -n kube-system ingress-nginx ingress-nginx/ingress-nginx --set controller.metrics.enabled=true --set controller.metrics.serviceMonitor.enabled=true --set controller.metrics.serviceMonitor.namespace="monitoring" --set controller.metrics.serviceMonitor.additionalLabels.release="prometheus-operator"
``` ```
Note the above includes enabling metric gathering for a Prometheus server.
We enable the metrics endpoint on the controller, then enable the
ServiceMonitor which is Prometheus resource that tells Prometheus about the
metrics. We also add an extra label for kubekorner's particular installation
of Prometheus so our ServiceMonitor can be found automatically.
### Local Path Configuration ### Local Path Configuration
When running on a K3S-based (rancher) cluster like the one currently running When running on a K3S-based (rancher) cluster like the one currently running
...@@ -285,10 +291,10 @@ Prometheus Operator will install its own custom resources definitions (CRDs) ...@@ -285,10 +291,10 @@ Prometheus Operator will install its own custom resources definitions (CRDs)
to allow other applications to create their own ways of interacting with to allow other applications to create their own ways of interacting with
Prometheus. Prometheus.
To install this on the Kubekorner K3s cluster we will use the stable To install this on the Kubekorner K3s cluster we will use the
prometheus-operator helm chart maintained by the helm community: prometheus-community prometheus stack helm chart maintained by the helm community:
https://github.com/helm/charts/tree/master/stable/prometheus-operator https://github.com/prometheus-community/helm-charts
First we will create a namespace specifically for prometheus: First we will create a namespace specifically for prometheus:
...@@ -296,25 +302,173 @@ First we will create a namespace specifically for prometheus: ...@@ -296,25 +302,173 @@ First we will create a namespace specifically for prometheus:
kubectl create namespace monitoring kubectl create namespace monitoring
``` ```
If your helm installation doesn't already have the necessary chart
repositories, they can be added by doing:
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm repo update
```
Then we will install the helm chart in that namespace with the release name Then we will install the helm chart in that namespace with the release name
"prometheus-operator". "prometheus-operator".
```bash ```bash
helm install -n monitoring prometheus-operator stable/prometheus-operator helm install -n monitoring prometheus-operator prometheus-community/kube-prometheus-stack
``` ```
Note, if your helm installation doesn't already have the stable chart
repository added you may need to do: Also note at the time of writing this installation results in some warnings:
```
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
```
This is described in a GitHub issue here: https://github.com/helm/charts/issues/17511
### Customizing Prometheus rules
In order to get the most out of Prometheus, it is a good idea to set up rules
for alerts to send to the AlertManager servers created by Prometheus. We can
then configure AlertManager to notify our development team of different
conditions if needed.
First, we need to create a set of rules that we want to be notified about. To
configure these we create one or more `PrometheusRule` objects. Here is an
example:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
app: kube-prometheus-stack
release: prometheus-operator
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
```
This creates an alert called "ExampleAlert" that is fired when `expr` is true.
In this case `vector(1)` is the equivalent of always true. The `expr` is
a PromQL query that has access to any field recorded by Prometheus.
Normally these rules should be automatically picked up by the Prometheus
server(s) by matching `labels`. By default, the Prometheus Operator installed
above will use the name of the helm chart for `app` and the name of the helm
release for `release` to match against.
To check, run:
```bash ```bash
helm repo add stable https://kubernetes-charts.storage.googleapis.com $ kubectl -n monitoring get prometheus/prometheus-operator-kube-p-prometheus -o go-template="{{ .spec.ruleSelector }}"
helm repo update map[matchLabels:map[app:kube-prometheus-stack release:prometheus-operator]]
``` ```
Also note at the time of writing this installation results in some warnings: Although a little cryptic, this is showing:
```yaml
matchLabels:
app: kube-prometheus-stack
release: prometheus-operator
``` ```
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
If the above yaml PrometheusRule configuration was stored in a `example_rule.yaml` we could
deploy it by running:
```bash
kubectl create -n monitoring -f example_rule.yaml
```
To investigate if our rules are showing up in Prometheus we can forward the
service to the cluster node and then forward that to our local machine
with SSH. Note you'll need to use the name of your service in your
installation.
```bash
kubectl -n monitoring port-forward service/prometheus-operated 9995:9090
``` ```
This is described in a GitHub issue here: https://github.com/helm/charts/issues/17511 If we go to `http://localhost:9995/alerts` we will see the current alerts
\ No newline at end of file Prometheus is aware of. We can click on "Graph" at the top and query the
Prometheus PromQL that we might want to use in our other rules.
We can do a similar check for firing alerts in the alertmanager by forwarding
another port:
```bash
kubectl -n monitoring port-forward service/prometheus-operator-kube-p-alertmanager 9993:9093
```
And going to `http://localhost:9993`.
### Customizing Prometheus Alerts
Now that the rules should have been picked up, we need to configure the
alertmanager to do something when these alerts are fired. The below
instructions are one approach to configuring the alertmanager. The available
methods are changing over time as the prometheus community grows the helm
chart used above. Other solutions may involve ConfigMap resources or mounting
additional volumes for alertmanager. The below approach is the simplest but
does require "upgrading" the Prometheus Operator installation whenever it
changes.
To configure how alerts are handled by alertmanager we need to modify the
alertmanager configuration. Below we've embedded our alertmanager
configuration in a YAML file that we will provide to our helm chart upgrade
as the new "values" file.
```yaml
alertmanager:
## Alertmanager configuration directives
## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
## https://prometheus.io/webtools/alerting/routing-tree-editor/
##
config:
global:
resolve_timeout: 5m
slack_api_url: "https://hooks.slack.com/services/blah/blah/blah"
route:
group_by: ["instance", "severity"]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: "null"
routes:
- match:
alertname: ExampleAlert
receiver: "geosphere-dev-team"
receivers:
- name: "null"
- name: "geosphere-dev-team"
slack_configs:
- channel: "#geo2grid"
text: "summary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
```
To upgrade the prometheus operator installation and assuming the above is in a
file called `custom_prom_values.yaml`:
```bash
helm upgrade --reuse-values -n monitoring -f custom_prom_values.yaml prometheus-operator prometheus-community/kube-prometheus-stack
```
You can verify that the upgrade updated the related secret with:
```bash
kubectl -n monitoring get secrets alertmanager-prometheus-operator-kube-p-alertmanager -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d
```
You should also see the config-reloader for alertmanager eventually pickup on
the new config:
```bash
kubectl -n monitoring logs pod/alertmanager-prometheus-operator-kube-p-alertmanager-0 -c config-reloader --tail 50 -f
```
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
app: kube-prometheus-stack
release: prometheus-operator
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
labels:
severity: warning
annotations:
summary: "Example Alert"
description: "A test prometheus rule that always fires"
\ No newline at end of file
This diff is collapsed.
alertmanager:
## Alertmanager configuration directives
## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
## https://prometheus.io/webtools/alerting/routing-tree-editor/
##
config:
global:
resolve_timeout: 5m
slack_api_url: "FIXME: <https://hooks.slack.com/services/...>"
route:
group_by: ["instance", "severity"]
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: "null"
routes:
- match_re:
ruleGroup: "geosphere-.*"
receiver: "geosphere-dev-team"
receivers:
- name: "null"
- name: "geosphere-dev-team"
slack_configs:
- channel: "#geo2grid"
send_resolved: true
icon_emoji: '{{ if eq .Status "firing" }}:fearful:{{ else }}:excellent:{{ end }}'
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment