Federation

Federation allows a Prometheus server to scrape selected time series from another Prometheus server [Prometheus’s documentation]

Cloudify’s use case for federation is a “cross-service federation”:

In cross-service federation, a Prometheus server of one service is configured to scrape selected data from another service’s Prometheus server to enable alerting and queries against both datasets within a single server [Prometheus’s documentation]

Below is a diagram of federation used in Cloudify’s use case. Notice that Prometheus is talking to Status Reporters (not other Promethus instances directly), which is to note the additional nginx component to all Status Reporters. Black arrows mark the “federation connections”.

Status Reporter federation

Configuration

Targets

Targets for federated-scraping are listed in /etc/prometheus/targets/other_*.yml files. These are the hosts, which will be used in federate_* jobs. For example database nodes are listed in /etc/prometheus/targets/other_postgres.yml file, which might look like this:

- targets: ["172.22.0.3:8009", "172.22.0.4:8009", "172.22.0.5:8009"]
  labels: {}

For more information about defining targets look for the file-based service discovery in Prometheus’s documentation.

Scraping jobs

federate_* scraping jobs are defined in /etc/prometheus/prometheus.yml file. Here is an example of a pre-defined job for scraping federated database nodes for postgres_exporter’s metrics:

- job_name: 'federate_postgresql'
  honor_labels: true
  scheme: 'https'
  tls_config:
    ca_file: /etc/cloudify/ssl/monitoring_ca_cert.pem
  basic_auth:
    username: a_user
    password: a_password
  metrics_path: /monitoring/federate
  params:
    'match[]':
      - '{job="postgresql",host!="172.22.0.3"}'
  file_sd_configs:
    - files:
      - '/etc/prometheus/targets/other_postgres.yml'

It reads: query all targets listed in /etc/prometheus/targets/other_postgres.yml file on HTTPS endpoint /monitoring/federate with given credentials and a TLS CA certificate for any metrics matching labels: job="postgresql" (postgres_exporter’s) and host!="172.22.0.3" (skip metrics of the node this Prometheus is running on).

This configuration requires fully-blown Status Reporter to be available on federated nodes. It means, that not only should there be a service-specific exporter installed (e.g. postgres_exporter for database nodes), but all other common components: node_exporter, Prometheus and nginx, all configured similarly to what Cloudify provides (proper TLS certificates, authentication credentials same as on all other nodes, same opened ports, etc.)