# Cluster Resource Administration This directory includes Kubernetes resources that should be installed on Kubernetes clusters that will have GeoSphere deployed on them. While there may be local cluster builtin equivalents to the resources defined here, these builtin names are not used in the current configuration in this deploy repository. The builtin resources could be used instead of installing the resources defined in this directory by updating the `values-X.yaml` files in the various directories and in the `.gitlab-ci.yml` configuration file. ## K3s - Kubekorner ### Nginx Ingress Controller At the time of writing, K3s comes with the traefik ingress controller with a version less than 2.0. It is our (geosphere project) that this controller is buggy and doesn't handle HTTPS certificates in an expected way. We've chosen to uninstalled the traefik controller and instead install the nginx ingress controller. It is possible in the future that newer versions of traefik (2.3+ is availabe but not supported by k3s) will not have the issues we've run into. It is also possible nginx will be used by K3s as an alternative ingress option. The [k3s FAQ](https://rancher.com/docs/k3s/latest/en/faq/) includes the following: ```bash How can I use my own Ingress instead of Traefik? Simply start K3s server with --disable traefik and deploy your ingress. ``` After further research we discovered that additional steps may be required: See https://github.com/rancher/k3s/issues/1160#issuecomment-561572618 ```bash For the record and future me, this is what needs to be done to disable Traefik during initial setup: Remove traefik helm chart resource: kubectl -n kube-system delete helmcharts.helm.cattle.io traefik Stop the k3s service: sudo service k3s stop Edit service file sudo nano /etc/systemd/system/k3s.service and add this line to ExecStart: --no-deploy traefik \ Reload the service file: sudo systemctl daemon-reload Remove the manifest file from auto-deploy folder: sudo rm /var/lib/rancher/k3s/server/manifests/traefik.yaml Start the k3s service: sudo service k3s start ``` Note the above `--no-deploy` flag is deprecated and `--disable` should be used. Alternatively, k3s could be updated completely with the `--disable traefik` flag added: ```bash curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy traefik --write-kubeconfig-mode 644" sh ``` Then nginx can be installed by following the instructions and settings described here: https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx ```bash helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm install -n kube-system ingress-nginx ingress-nginx/ingress-nginx --set controller.metrics.enabled=true ``` ### Local Path Configuration When running on a K3S-based (rancher) cluster like the one currently running on kubekorner.ssec.wisc.edu, the local path provisioner should be updated to point to larger storage paths. The K3S cluster software comes with a local path provisioner as the default storage provisioner. This means that when an application asks for generic storage (PersistentVolumeClaim), this provisioner will be used to find and provide the storage. However, by default this provisioner is configured to give access to `/var/lib/rancher/k3s/storage` which is typically space limited. By modifying the `config.json` stored in the `local-path-config` ConfigMap, we can tell the provisioner where storage should be provided from for each node. See https://github.com/rancher/local-path-provisioner/blob/master/README.md#configuration for more information. To apply: ```bash echo -e "data:\n config.json: |-" > tmp.yaml cat k3s-local-path-config.json | awk '{ print " " $0 }' >> tmp.yaml # dry run kubectl patch -n kube-system cm/local-path-config --type merge --patch "$(cat tmp.yaml)" --dry-run=client # not dry run kubectl patch -n kube-system cm/local-path-config --type merge --patch "$(cat tmp.yaml)" ``` ### MinIO - Local S3 storage For easy data storage using an S3 interface we install MinIO on our K3s cluster. This will take advantage of the local path provisioner we configured above so that the storage has more than the couple hundred gigabytes of storage in the default location. To do the initial MinIO installation run the following in the bash terminal on the cluster: ```bash namespace="geosphere-test" helm upgrade -v 2 --install -f admin/values-geosphere-minio.yaml --set accessKey=false --set secretKey=false -n $namespace geosphere-minio stable/minio ``` The values YAML file provides configuration information specific to this MinIO installation. The accessKey and secretKey set to `false` cause the helm chart to generate random values for these. These values are then used to authenticate to the S3 storage in the application. Because of this, it is important that the "release" be called "geosphere-minio" as above so the various parts of this installation can be found by the geosphere application. Note, if your helm installation doesn't already have the stable chart repository added you may need to do: ```bash helm repo add stable https://kubernetes-charts.storage.googleapis.com helm repo update ``` Next, we need to configure life cycle policies for the MinIO buckets so that they are automatically cleared of old data. On the cluster run: ```bash namespace="geosphere-test" ak=$(kubectl get secret -n "$namespace" geosphere-minio -o jsonpath="{.data.accesskey}" | base64 -d) sk=$(kubectl get secret -n "$namespace" geosphere-minio -o jsonpath="{.data.secretkey}" | base64 -d) curl -O "https://gitlab.ssec.wisc.edu/cspp_geo/geosphere/geosphere-deploy/-/blob/master/admin/abi-netcdf-bucket-lifecycle.json" for bucket in g16-abi-l1b-netcdf g17-abi-l1b-netcdf; do kubectl run -n "$namespace" --env=AWS_ACCESS_KEY_ID="$ak" --env=AWS_SECRET_ACCESS_KEY="$sk" --restart=Never --rm -it --image=amazon/aws-cli set-bucket-lifecycle -- --endpoint-url "http://geosphere-minio:9000" s3api put-bucket-lifecycle-configuration --bucket "$bucket" --lifecycle-configuration="$(cat abi-netcdf-bucket-lifecycle.json)" done ``` #### Upgrading existing MinIO installation If upgrading an existing installation of MinIO then we must make sure that we tell the helm chart what the existing accessKey and secretKey are or it will generate new random values for these and clients may become out of sync. To do this, run the following in bash on the cluster: ```bash ak=$(kubectl get secret -n "$namespace" geosphere-minio -o jsonpath="{.data.accesskey}" | base64 -d) sk=$(kubectl get secret -n "$namespace" geosphere-minio -o jsonpath="{.data.secretkey}" | base64 -d) EXTRA_ARGS="--set accessKey=$ak --set secretKey=$sk" helm upgrade -v 2 --install -f admin/values-geosphere-minio.yaml $EXTRA_ARGS -n $namespace geosphere-minio stable/minio ``` Note, `geosphere-minio` in the above commands must match the name of the release from the original installation. ### Longhorn - Shared Block Storage Most cloud platforms have some concept of a shared block storage (AWS EBS, GCP Persistent Storage, etc). These can be mounted as normal volumes in our containers. Although our K3S installation has a local path provisioner these volumes are limited to one single node. We need another solution that shares the volumes between nodes. That's where longhorn comes in. Follow the official longhorn installation instructions: https://longhorn.io/docs/1.0.0/deploy/install/install-with-helm/ Unless newer versions no longer require it, on kubekorner we needed to install and enable a iscsi daemon: ```bash yum install iscsi-initiator-utils systemctl enable iscsid systemctl start iscsid ``` If you have a particular mount on the cluster nodes that has more space than the default `/var` path, you may want to customize this setting. For longhorn 1.0 you can do this by adding `--set defaultSettings.defaultDataPath=/data` to your helm install command. Additionally, if your cluster only has 1 or 2 nodes you may want to change the default number of replica volumes longhorn attempts to create. Otherwise, by default, longhorn's "hard affinity" will stop volumes from being created since it can't make all of the replicas (only one replica per node). At the time of writing, kubekorner has had its longhorn instance installed with: ```bash helm install longhorn ./chart/ --namespace longhorn-system --set persistence.defaultClass=false --set defaultSettings.defaultReplicaCount=1 --set persistence.defaultClassReplicaCount=1 --set ingress.enabled=true --set ingress.host="kubekorner.ssec.wisc.edu" --set defaultSettings.defaultDataPath="/data" ``` From the webUI or following longhorn's current instructions we can change most if not all of these settings. If a cluster with one node has more nodes added on in the future you may want to consider increasing the replicate count. ### Storage - Local Large Cache **DEPRECATED**: See local path provisioner above. This storage class and persistent volume can be used for cases where a GeoSphere component needs relatively high performance and large capacity storage. Both the StorageClass and the PersistentVolume are defined in `local-large-cache.yaml`. This storage is primarily used for GeoSphere's tile cache (used by MapCache). It defines large storage that is physically located/connected to the node where the pod is being run or at least performs like it is. The term "large" here refers to multiple terabytes (3-10TB). While this isn't large in generic storage terms, it is considered large for a "cache" which is not guaranteed to persist. To apply: ```bash kubectl apply -f local-large-cache.yaml ``` To delete (make unavailable): ```bash kubectl delete pv/local-large-cache kubectl delete sc/local-large-cache ``` ### Storage - Local Medium Archive **DEPRECATED**: See local path provisioner above. Similar to Local Large Cache above, but larger available space. Note this should only be used for testing as data will be deleted when the claim is removed. ## Configure HTTPS on Ingress Web services being served on the cluster via HTTP can be made available via HTTPS by enabling TLS on the Ingress controller of the cluster. The below instructions will walk through how to enable this. First, we must create a Secret to store the certificates. For SSEC-based services, certificates should be requested from Technical Computing (TC). To create the secret, have the certificate file and key file available in your current directory and run: ``` kubectl create secret tls mysite-tls-certs --cert=mycert.crt --key=mycert.key ``` Where `mysite-tls-certs` is the name of the secret, `tls` is the type of the secret, and `mycert.crt` and `mycert.key` are the actual certificate files. Make sure if this certificate is for a specific namespace that you add `-n mynamespace`. Then we need to make sure our Service definition includes something like: ```bash tls: - hosts: - mysite.ssec.wisc.edu secretName: mysite-tls-certs ``` Once this is deployed the certificate should now be used when requesting the HTTPS version of your service. You may also want to add the following to force users to be redirected to HTTPS from HTTP requests. This is what it looks like in the `values.yaml` file, but shows up in the `metadata` section of the `Ingress` definition. ```yaml ingress: annotations: ingress.kubernetes.io/ssl-redirect: "true" ``` Note: this annotation applies to the traefik ingress controller and may not be the same for nginx or other ingress controllers installed on a cluster. ## Monitoring a cluster with Prometheus One of the best ways to fully monitor your cluster is to install Prometheus. Prometheus is itself a separate service for collecting metrics from various sources and presenting them to the user. One of the best ways to get this functionality on a Kubernetes cluster is by installing [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator). Prometheus Operator will install its own custom resources definitions (CRDs) to allow other applications to create their own ways of interacting with Prometheus. To install this on the Kubekorner K3s cluster we will use the stable prometheus-operator helm chart maintained by the helm community: https://github.com/helm/charts/tree/master/stable/prometheus-operator First we will create a namespace specifically for prometheus: ```bash kubectl create namespace monitoring ``` Then we will install the helm chart in that namespace with the release name "prometheus-operator". ```bash helm install -n monitoring prometheus-operator stable/prometheus-operator ``` Note, if your helm installation doesn't already have the stable chart repository added you may need to do: ```bash helm repo add stable https://kubernetes-charts.storage.googleapis.com helm repo update ``` Also note at the time of writing this installation results in some warnings: ``` manifest_sorter.go:192: info: skipping unknown hook: "crd-install" ``` This is described in a GitHub issue here: https://github.com/helm/charts/issues/17511