Scraping cluster metrics from Talos with kube-prometheus-stack
Recently, I upgraded my main cluster to run Talos instead of k3s on Ubuntu server. When I re-deployed kube-prometheus-stack, I found that Prometheus wasn't able to scrap metrics from etcd, kube-controller-manager and kube-scheduler.
For the latter 2, the fix is simple - we just need to set the bind address to 0.0.0.0
instead of 127.0.0.1
. To do this, in your controlplane Talos configuration, you'll need to set the following:
controllerManager:
image: registry.k8s.io/kube-controller-manager:v1.27.3
extraArgs:
bind-address: 0.0.0.0
scheduler:
image: registry.k8s.io/kube-scheduler:v1.27.3
extraArgs:
bind-address: 0.0.0.0
Etcd is unfortunately not as easy. We need to extract the etcd certificates from one of the controlplane nodes and mount them into Prometheus to be able to scrape the metrics port.
Unlike you may be used to, in Talos etcd certificates are located in /system/secrets/etcd
. To get the certificates, you can use a temporary pod to access your host filesystem:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: kube-system
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
volumeMounts:
- mountPath: /system/secrets/etcd
name: k8setcdcert
hostNetwork: true
nodeSelector:
kubernetes.io/hostname: talos-1
tolerations:
- key: node-role.kubernetes.io/etcd
operator: Equal
value: "true"
- key: node-role.kubernetes.io/controlplane
operator: Equal
value: "true"
volumes:
- hostPath:
path: /system/secrets/etcd
type: ""
name: k8setcdcert
hostNetwork: true
restartPolicy: Always
Then, copy ca.crt, server.crt, and server.key. Next, we'll create a secret with those files:
apiVersion: v1
kind: Secret
metadata:
name: etcd-certs
type: Opaque
data:
etcd-ca.crt: |
...
etcd-client.crt: |
...
etcd-client-key.key: |
...
Finally, we'll need to mount the secret and update the scrape target in Prometheus. In your kube-prometheus-stack values, update kubeEtcd:
kubeEtcd:
enabled: true
endpoints:
- <your control plane nodes>
service:
enabled: true
port: 2379
targetPort: 2379
serviceMonitor:
scheme: https
insecureSkipVerify: false
serverName: localhost
caFile: /etc/prometheus/secrets/etcd-certs/etcd-ca.crt
certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.crt
keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.key
You'll also need to specify the secret under prometheusSpec:
prometheusSpec:
secrets:
- etcd-certs
And that's it! You should now be able to scrape etcd, kube-controller-manager, and kube-scheduler when running Talos 😊
Member discussion