Federation
Federation allows a Prometheus server to scrape selected time series from another Prometheus server [Prometheus’s documentation]
Cloudify’s use case for federation is a “cross-service federation”:
In cross-service federation, a Prometheus server of one service is configured to scrape selected data from another service’s Prometheus server to enable alerting and queries against both datasets within a single server [Prometheus’s documentation]
Below is a diagram of federation used in Cloudify’s use case. Notice that Prometheus is talking to Status Reporters (not other Promethus instances directly), which is to note the additional nginx component to all Status Reporters. Black arrows mark the “federation connections”.
Configuration
Targets
Targets for federated-scraping are listed in /etc/prometheus/targets/other_*.yml
files. These are
the hosts, which will be used in federate_*
jobs. For example database nodes are listed in
/etc/prometheus/targets/other_postgres.yml
file, which might look like this:
- targets: ["172.22.0.3:8009", "172.22.0.4:8009", "172.22.0.5:8009"]
labels: {}
For more information about defining targets look for the file-based service discovery in Prometheus’s documentation.
Scraping jobs
federate_*
scraping jobs are defined in /etc/prometheus/prometheus.yml
file. Here is an example
of a pre-defined job for scraping federated database nodes for postgres_exporter’s metrics:
- job_name: 'federate_postgresql'
honor_labels: true
scheme: 'https'
tls_config:
ca_file: /etc/cloudify/ssl/monitoring_ca_cert.pem
basic_auth:
username: a_user
password: a_password
metrics_path: /monitoring/federate
params:
'match[]':
- '{job="postgresql",host!="172.22.0.3"}'
file_sd_configs:
- files:
- '/etc/prometheus/targets/other_postgres.yml'
It reads: query all targets listed in /etc/prometheus/targets/other_postgres.yml
file on HTTPS
endpoint /monitoring/federate
with given credentials and a TLS CA certificate for any metrics
matching labels: job="postgresql"
(postgres_exporter’s) and host!="172.22.0.3"
(skip
metrics of the node this Prometheus is running on).
This configuration requires fully-blown Status Reporter to be available on federated nodes. It means, that not only should there be a service-specific exporter installed (e.g. postgres_exporter for database nodes), but all other common components: node_exporter, Prometheus and nginx, all configured similarly to what Cloudify provides (proper TLS certificates, authentication credentials same as on all other nodes, same opened ports, etc.)