Prometheus

Tutorial to scrape your application metrics on an h8lio managed cluster, using the Prometheus Operator pattern.

The Operator is already there

On h8lio you do not install the kube-prometheus-stack chart yourself. The Prometheus Operator and its CRDs (Prometheus, Alertmanager, ServiceMonitor, PodMonitor, Probe, PrometheusRule, ScrapeConfig, ThanosRuler) are installed and reconciled cluster-wide by the platform. Your job is to create instances and scrape definitions as custom resources in your own namespaces, within the RBAC granted to your role.

Role	`monitoring.coreos.com` resources
Owner / Admin / Operator	full access (create / update / delete `Prometheus`, `Alertmanager`, `ServiceMonitor`, `PodMonitor`, `Probe`, `PrometheusRule`, `ThanosRuler`)
Developer	read-only (`get` / `list` / `watch`)

Installing the chart would also pull in a node-exporter DaemonSet and cluster-scoped CRDs, which a namespace tenant cannot deploy on a managed cluster. Node and host-level metrics are the platform’s responsibility and are surfaced in the shared Grafana described below.

Two ways to see your metrics

The shared Grafana (zero setup)

h8lio provisions a shared Grafana at https://monitoring.h8l.io with a set of read-only dashboards scoped to your organization (CPU, memory, storage per namespace). It is wired automatically when your organization is created, so basic namespace monitoring needs no work from you.

Your own Prometheus + Grafana (custom metrics)

To scrape application metrics (MariaDB, PostgreSQL, your own services) and build your own dashboards and alerts, run your own Prometheus instance. The recommended layout is a dedicated monitoring namespace (an h8lio cluster such as acme-monitoring) that observes your organization’s other namespaces (acme-prod, acme-staging, …). All namespaces of an organization share the label tenant: <organization>, which makes cross-namespace selection a one-liner.

Requirements

kubectl configured for your cluster (see kubectl)
An Owner, Admin, or Operator role on the organization
A monitoring namespace (cluster) to host the instance, for example acme-monitoring
A deployed service exposing a metrics endpoint (typically a Service with a named metrics port)

Create a Prometheus instance

A Prometheus resource tells the platform Operator to provision and reconcile a Prometheus server for you. It needs a ServiceAccount, and that account needs read access to the discovery objects in every namespace you want to scrape.

Create the instance in your monitoring namespace. The namespace selectors below match every namespace of your organization through its shared tenant label:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: acme-monitoring
---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: acme
  namespace: acme-monitoring
spec:
  replicas: 1
  serviceAccountName: prometheus
  retention: 15d
  resources:
    requests:
      memory: 512Mi
  # discover ServiceMonitors / PodMonitors / PrometheusRules across
  # every namespace of your organization (they share the `tenant` label)
  serviceMonitorNamespaceSelector:
    matchLabels:
      tenant: acme
  podMonitorNamespaceSelector:
    matchLabels:
      tenant: acme
  ruleNamespaceSelector:
    matchLabels:
      tenant: acme
  # within those namespaces, select all objects (no label filter)
  serviceMonitorSelector: {}
  podMonitorSelector: {}
  ruleSelector: {}
  # persist the TSDB on a Ceph-backed volume
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: eu-west-fr-gra-block-nvme-ec-ext4
        resources:
          requests:
            storage: 20Gi

The Prometheus pod discovers its targets through the Kubernetes API, so its ServiceAccount needs get / list / watch on the discovery objects in each namespace it scrapes. A namespace-scoped Role (not a ClusterRole) is enough. Apply the pair below in every namespace you collect from, including acme-monitoring itself:

# repeat per scraped namespace: acme-prod, acme-staging, acme-monitoring, ...
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus
  namespace: acme-prod
rules:
  - apiGroups: [""]
    resources: ["services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources: ["endpointslices"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus
  namespace: acme-prod
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus
subjects:
  # the instance's ServiceAccount lives in the monitoring namespace
  - kind: ServiceAccount
    name: prometheus
    namespace: acme-monitoring

Once the Operator reconciles the resource, reach the Prometheus UI locally:

kubectl -n acme-monitoring port-forward svc/prometheus-operated 9090:9090

Then open localhost:9090 and check Status → Targets.

Service Monitoring

A ServiceMonitor declaratively selects the Kubernetes Services to scrape. Create it in the application’s namespace; the instance picks it up because that namespace carries the tenant label its selector matches.

Example ServiceMonitor scraping MariaDB metrics (exposed by a mysqld-exporter sidecar):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mariadb
  # the application's namespace
  namespace: acme-prod
spec:
  # name of the Prometheus job
  jobLabel: mariadb
  endpoints:
    - interval: 15s
      # MariaDB service "metrics" endpoint
      port: metrics
  # MariaDB service selector
  selector:
    matchLabels:
      app.kubernetes.io/component: primary
      app.kubernetes.io/instance: mariadb

See the ServiceMonitor CRD for the full schema. PodMonitor is the equivalent that targets pods directly when there is no Service.

Once the ServiceMonitor is applied, the target and its endpoints appear in the Prometheus UI under Status → Targets. If the target is up, you can run your first PromQL queries against the scraped metrics.

External targets (ScrapeConfig)

ServiceMonitor and PodMonitor cover in-cluster targets discovered by the Kubernetes service catalogue. For targets outside the cluster, or scrape configurations those CRDs cannot express, use the ScrapeConfig CRD (monitoring.coreos.com/v1alpha1). It is complementary: keep ServiceMonitor / PodMonitor as the default for your apps, and reach for ScrapeConfig only for external or static targets and advanced service discovery (cloud, DNS, file-based).

Rules

PrometheusRule defines recording and alerting rules. Create it in the application’s namespace, like the ServiceMonitor. Alerts surface in Alertmanager and can also drive Grafana Alerting.

Example PrometheusRule for the MariaDB service above:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: mariadb
  # the application's namespace
  namespace: acme-prod
spec:
  groups:
    # rules group name
    - name: mariadb
      rules:
        # check if the MariaDB ServiceMonitor job is down
        - alert: MariaDB-Down
          annotations:
            message: MariaDB instance {{ $labels.instance }} is down
            summary: MariaDB instance is down
          expr: absent(up{job="mariadb"} == 1)
          for: 5m
          labels:
            service: mariadb
            severity: warning
        # check if MariaDB has more than 100 active connections, using PromQL
        - alert: HighMariaDBConnections
          annotations:
            description: >-
              MariaDB has more than 100 active connections for more than 5
              minutes.
            summary: High number of MariaDB connections
          expr: mysql_global_status_threads_connected > 100
          for: 5m
          labels:
            severity: warning

See the PrometheusRule CRD for the full schema.

Visualize in Grafana

For custom dashboards, run your own Grafana in the monitoring namespace (the same place that can host Loki for your logs) and add your Prometheus instance as a datasource. The Operator exposes the instance through the prometheus-operated Service:

http://prometheus-operated.acme-monitoring.svc:9090

From there you can import community dashboards or build your own, and combine metrics with your Loki logs in a single place.

Next Steps

Pair this with Loki for logs in the same monitoring namespace
Import or build a Grafana dashboard for your metrics
Configure Grafana Alerting or Alertmanager routing for your PrometheusRule alerts