CODEX

Reliable Kubernetes on a Raspberry Pi Cluster: Monitoring

Scott Jones
CodeX
Published in
6 min readJan 16, 2021

--

Photo by Ibrahim Boran on Unsplash

With the previous articles in this series, you will have created a 3 node k3s cluster running on RPis. But how do you know it's working? How do you know that a node is struggling? Or even worse, down? Monitoring has been crucial for the health of my cluster pointing out the pain points where improvements were needed. Today I will work through how I did it so you can follow along with your own cluster

Part 1: Introduction
Part 2: The Foundations
Part 3: Storage
Part 4: Monitoring
Part 5: Security

Node-Exporter

The first critical piece of the puzzle is a way of being able to export crucial health information about each node. Node-Exporter is a brilliant utility you can install as a service and leave running which has the ability to export in a format for Prometheus to pick up. You want to do this for each of the nodes in your cluster.

First, you need to download and extract it.

$ curl -SL https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-armv7.tar.gz > node_exporter.tar.gz && sudo tar -xvf node_exporter.tar.gz -C /usr/local/bin/ --strip-components=1

Once we have it, we need to set it up as a systemd service to ensure it restarts after a reboot. Create a file /etc/systemd/system/nodeexporter.service :

[Unit]
Description=NodeExporter
[Service]
TimeoutStartSec=0
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target

Then you need to register it:

$ sudo systemctl daemon-reload \
&& sudo systemctl enable nodeexporter \
&& sudo systemctl start nodeexporter

Prometheus

We now need to configure and deploy Prometheus to our cluster. Thankfully, this is as simple as a single file. Create prometheus.yaml as below:

apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
app: prometheus
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yaml: |
global:
scrape_interval: 15s
external_labels:
monitor: 'k3s-monitor'
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'K3S'
static_configs:
- targets: ['k3s-master:9100', 'k3s-node1:9100', 'k3s-node2:9100']
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yaml
ports:
- containerPort: 9090
volumes:
- name: config-volume
configMap:
name: prometheus-config
---
kind: Service
apiVersion: v1
metadata:
namespace: monitoring
name: prometheus-service
spec:
selector:
app: prometheus
ports:
- name: promui
protocol: TCP
port: 9090
targetPort: 9090

Apply this in the usual way

$ sudo kubectl apply -f prometheus.yaml

This sets up our scrape config for Prometheus and deploys it into our cluster in the monitoring namespace. We create a service for other pods in the cluster to be able to access it, but we don't give it a load balancer or ingress route because we explicitly do not want to expose it outside our cluster. Check it is up and running with the following command

$ sudo kubectl get pods -n monitoring

If all has been successful, you will get output like the one below

Successfully running prometheus

Now we have something collating all our data, we need something to be able to display it.

Grafana

Grafana is the perfect tool for being able to visualize all this data within our cluster. It has support for Prometheus, and lots of community created dashboards for a lot of the standard monitoring you are going to want to be doing.

First things first — we need to create our manifest file, grafana.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
name: grafana-nfs-volume
namespace: monitoring
labels:
directory: grafana
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: slow
nfs:
path: <<NFS-SERVER-PATH>>
server: <<NFS-SERVER-CLUSTER-IP>>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-nfs-claim
namespace: monitoring
spec:
storageClassName: slow
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
selector:
matchLabels:
directory: grafana
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
name: grafana
spec:
securityContext:
runAsUser: 1001
runAsGroup: 1001
containers:
- name: grafana
image: grafana/grafana
imagePullPolicy: Always
volumeMounts:
- name: grafana-nfs-volume
mountPath: "/var/lib/grafana"
volumes:
- name: grafana-nfs-volume
persistentVolumeClaim:
claimName: grafana-nfs-claim
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
name: grafana
protocol: TCP
targetPort: 3000
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: grafana-route
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`grafana.internal`)
kind: Rule
services:
- name: grafana-service
port: 3000
tls:
certResolver: cloudflare

Something you will need to update in the above yaml — it has persistent storage requirements, so will need updating to point to the cluster ip of the nfs server we set up earlier. Once you have done this, apply it as normal:

$ sudo kubectl apply -f grafana.yaml

When it has successfully started up, listing the pods will return one more running pod

$ sudo kubectl get pods -n monitoring
Get pods output for 2 running pods

You should now be able to go to https://grafana.internal and see a running Grafana instance

Grafana login page

The default username and password are admin and admin. This will need changing, but obviously, there are things to do to secure this more — new user, oAuth, etc. Securing this instance is out of scope for this article.

The first thing you want to do here is to create a new data source. There is a button on the first dashboard to do that. You want a Prometheus data source, and the only thing you have to input is the URL — http://prometheus-service:9090 . Hit Save & Test and you should see a success message

Successfully adding a data source

Once we have a data source, we need a dashboard to display it. Head over to the dashboard import page (https://grafana.internal/dashboard/import). We are going to be using the Prometheus Node Exporter Full dashboard (https://grafana.com/grafana/dashboards/1860) so go ahead and drop that in the dashboard import box and hit load

Importing our dashboard

Select your Prometheus data source and import and there we have it, a dashboard with which we can monitor the state of our cluster. Be sure to mark it as a favorite so it's easier to find at a later point.

Working dashboard

We now have a way of viewing cluster health, and to start our diagnosis should anything actually go awry in the cluster. It's not a full picture since we do not yet have application-specific metrics, but node health is a major step forward for being your own sysadmin! Next time we will be taking a look at security and ensuring access to the things running on your cluster can be configured and locked down.

--

--

Scott Jones
CodeX

Home automation enthusiast. Self titled k8s Guru. RPi cluster god