Skip to content

Prometheus

Tutorial to scrape your application metrics on an h8lio managed cluster, using the Prometheus Operator pattern.

On h8lio you do not install the kube-prometheus-stack chart yourself. The Prometheus Operator and its CRDs (Prometheus, Alertmanager, ServiceMonitor, PodMonitor, Probe, PrometheusRule, ScrapeConfig, ThanosRuler) are installed and reconciled cluster-wide by the platform. Your job is to create instances and scrape definitions as custom resources in your own namespaces, within the RBAC granted to your role.

Rolemonitoring.coreos.com resources
Owner / Admin / Operatorfull access (create / update / delete Prometheus, Alertmanager, ServiceMonitor, PodMonitor, Probe, PrometheusRule, ThanosRuler)
Developerread-only (get / list / watch)

Installing the chart would also pull in a node-exporter DaemonSet and cluster-scoped CRDs, which a namespace tenant cannot deploy on a managed cluster. Node and host-level metrics are the platform’s responsibility and are surfaced in the shared Grafana described below.

h8lio provisions a shared Grafana at https://monitoring.h8l.io with a set of read-only dashboards scoped to your organization (CPU, memory, storage per namespace). It is wired automatically when your organization is created, so basic namespace monitoring needs no work from you.

Your own Prometheus + Grafana (custom metrics)

Section titled “Your own Prometheus + Grafana (custom metrics)”

To scrape application metrics (MariaDB, PostgreSQL, your own services) and build your own dashboards and alerts, run your own Prometheus instance. The recommended layout is a dedicated monitoring namespace (an h8lio cluster such as acme-monitoring) that observes your organization’s other namespaces (acme-prod, acme-staging, …). All namespaces of an organization share the label tenant: <organization>, which makes cross-namespace selection a one-liner.

  • kubectl configured for your cluster (see kubectl)
  • An Owner, Admin, or Operator role on the organization
  • A monitoring namespace (cluster) to host the instance, for example acme-monitoring
  • A deployed service exposing a metrics endpoint (typically a Service with a named metrics port)

A Prometheus resource tells the platform Operator to provision and reconcile a Prometheus server for you. It needs a ServiceAccount, and that account needs read access to the discovery objects in every namespace you want to scrape.

Create the instance in your monitoring namespace. The namespace selectors below match every namespace of your organization through its shared tenant label:

apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: acme-monitoring
---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: acme
namespace: acme-monitoring
spec:
replicas: 1
serviceAccountName: prometheus
retention: 15d
resources:
requests:
memory: 512Mi
# discover ServiceMonitors / PodMonitors / PrometheusRules across
# every namespace of your organization (they share the `tenant` label)
serviceMonitorNamespaceSelector:
matchLabels:
tenant: acme
podMonitorNamespaceSelector:
matchLabels:
tenant: acme
ruleNamespaceSelector:
matchLabels:
tenant: acme
# within those namespaces, select all objects (no label filter)
serviceMonitorSelector: {}
podMonitorSelector: {}
ruleSelector: {}
# persist the TSDB on a Ceph-backed volume
storage:
volumeClaimTemplate:
spec:
storageClassName: eu-west-fr-gra-block-nvme-ec-ext4
resources:
requests:
storage: 20Gi

The Prometheus pod discovers its targets through the Kubernetes API, so its ServiceAccount needs get / list / watch on the discovery objects in each namespace it scrapes. A namespace-scoped Role (not a ClusterRole) is enough. Apply the pair below in every namespace you collect from, including acme-monitoring itself:

# repeat per scraped namespace: acme-prod, acme-staging, acme-monitoring, ...
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus
namespace: acme-prod
rules:
- apiGroups: [""]
resources: ["services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus
namespace: acme-prod
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus
subjects:
# the instance's ServiceAccount lives in the monitoring namespace
- kind: ServiceAccount
name: prometheus
namespace: acme-monitoring

Once the Operator reconciles the resource, reach the Prometheus UI locally:

Terminal window
kubectl -n acme-monitoring port-forward svc/prometheus-operated 9090:9090

Then open localhost:9090 and check Status → Targets.

A ServiceMonitor declaratively selects the Kubernetes Services to scrape. Create it in the application’s namespace; the instance picks it up because that namespace carries the tenant label its selector matches.

Example ServiceMonitor scraping MariaDB metrics (exposed by a mysqld-exporter sidecar):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mariadb
# the application's namespace
namespace: acme-prod
spec:
# name of the Prometheus job
jobLabel: mariadb
endpoints:
- interval: 15s
# MariaDB service "metrics" endpoint
port: metrics
# MariaDB service selector
selector:
matchLabels:
app.kubernetes.io/component: primary
app.kubernetes.io/instance: mariadb

See the ServiceMonitor CRD for the full schema. PodMonitor is the equivalent that targets pods directly when there is no Service.

Once the ServiceMonitor is applied, the target and its endpoints appear in the Prometheus UI under Status → Targets. If the target is up, you can run your first PromQL queries against the scraped metrics.

ServiceMonitor and PodMonitor cover in-cluster targets discovered by the Kubernetes service catalogue. For targets outside the cluster, or scrape configurations those CRDs cannot express, use the ScrapeConfig CRD (monitoring.coreos.com/v1alpha1). It is complementary: keep ServiceMonitor / PodMonitor as the default for your apps, and reach for ScrapeConfig only for external or static targets and advanced service discovery (cloud, DNS, file-based).

PrometheusRule defines recording and alerting rules. Create it in the application’s namespace, like the ServiceMonitor. Alerts surface in Alertmanager and can also drive Grafana Alerting.

Example PrometheusRule for the MariaDB service above:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: mariadb
# the application's namespace
namespace: acme-prod
spec:
groups:
# rules group name
- name: mariadb
rules:
# check if the MariaDB ServiceMonitor job is down
- alert: MariaDB-Down
annotations:
message: MariaDB instance {{ $labels.instance }} is down
summary: MariaDB instance is down
expr: absent(up{job="mariadb"} == 1)
for: 5m
labels:
service: mariadb
severity: warning
# check if MariaDB has more than 100 active connections, using PromQL
- alert: HighMariaDBConnections
annotations:
description: >-
MariaDB has more than 100 active connections for more than 5
minutes.
summary: High number of MariaDB connections
expr: mysql_global_status_threads_connected > 100
for: 5m
labels:
severity: warning

See the PrometheusRule CRD for the full schema.

For custom dashboards, run your own Grafana in the monitoring namespace (the same place that can host Loki for your logs) and add your Prometheus instance as a datasource. The Operator exposes the instance through the prometheus-operated Service:

http://prometheus-operated.acme-monitoring.svc:9090

From there you can import community dashboards or build your own, and combine metrics with your Loki logs in a single place.

  • Pair this with Loki for logs in the same monitoring namespace
  • Import or build a Grafana dashboard for your metrics
  • Configure Grafana Alerting or Alertmanager routing for your PrometheusRule alerts