2 min read

Scraping cluster metrics from Talos with kube-prometheus-stack

Scraping cluster metrics from Talos with kube-prometheus-stack
Photo by Luke Chesser / Unsplash

Recently, I upgraded my main cluster to run Talos instead of k3s on Ubuntu server. When I re-deployed kube-prometheus-stack, I found that Prometheus wasn't able to scrap metrics from etcd, kube-controller-manager and kube-scheduler.

For the latter 2, the fix is simple - we just need to set the bind address to 0.0.0.0 instead of 127.0.0.1. To do this, in your controlplane Talos configuration, you'll need to set the following:

  controllerManager:
    image: registry.k8s.io/kube-controller-manager:v1.27.3
    extraArgs:
      bind-address: 0.0.0.0
      
  scheduler:
    image: registry.k8s.io/kube-scheduler:v1.27.3 
    extraArgs:
      bind-address: 0.0.0.0

Etcd is unfortunately not as easy. We need to extract the etcd certificates from one of the controlplane nodes and mount them into Prometheus to be able to scrape the metrics port.

Unlike you may be used to, in Talos etcd certificates are located in /system/secrets/etcd. To get the certificates, you can use a temporary pod to access your host filesystem:


apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: kube-system
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
    volumeMounts:
    - mountPath: /system/secrets/etcd
      name: k8setcdcert
  hostNetwork: true
  nodeSelector:
    kubernetes.io/hostname: talos-1
  tolerations:
    - key: node-role.kubernetes.io/etcd
      operator: Equal
      value: "true"
    - key: node-role.kubernetes.io/controlplane
      operator: Equal
      value: "true"
  volumes:
  - hostPath:
      path: /system/secrets/etcd
      type: ""
    name: k8setcdcert
  hostNetwork: true
  restartPolicy: Always

Then, copy ca.crt, server.crt, and server.key. Next, we'll create a secret with those files:

apiVersion: v1
kind: Secret
metadata:
    name: etcd-certs
type: Opaque
data:
    etcd-ca.crt: |
    	...
    etcd-client.crt: |
    	...
    etcd-client-key.key: |
	...

Finally, we'll need to mount the secret and update the scrape target in Prometheus. In your kube-prometheus-stack values, update kubeEtcd:

    kubeEtcd:
      enabled: true
      endpoints: 
      	- <your control plane nodes>
      service:
        enabled: true
        port: 2379
        targetPort: 2379
      serviceMonitor:
        scheme: https
        insecureSkipVerify: false
        serverName: localhost
        caFile: /etc/prometheus/secrets/etcd-certs/etcd-ca.crt
        certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.crt
        keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.key

You'll also need to specify the secret under prometheusSpec:

      prometheusSpec:
        secrets:
          - etcd-certs

And that's it! You should now be able to scrape etcd, kube-controller-manager, and kube-scheduler when running Talos 😊