Configuring Amazon CloudWatch Agent to Consume Prometheus Endpoint Metrics
Jesse Aranki
Jesse Aranki
Leveraging CloudWatch for Prometheus Metrics
Co-authored by Aashish Jolly
HashiCorp Terraform Enterprise (TFE) is a self-hosted distribution of Terraform Cloud. It offers enterprises a private instance of the Terraform Cloud application with no resource limits and additional enterprise-grade architectural features like audit logging and SAML single sign-on.
The Terraform Enterprise metrics service collects several runtime metrics. You can use this data to observe your installation in real-time. You can also monitor and alert on these metrics to detect anomalous incidents, performance degradation, and utilization trends. Terraform Enterprise aggregates these metrics on a 5 second interval and keeps them in memory for 15 seconds.
Monitoring is not without its challenges if you’re working in EC2; this blog post covers how to set up the Amazon CloudWatch agent to consume TFE metrics from the exposed Prometheus endpoint and forward the metrics onto CloudWatch without the need to work with Prometheus.
In this blog post, I will work with TFE FDO (flexible deployment options) as this is the recommended way to work with TFE nowadays.
Enabling metrics on TFE
Before we begin, let’s ensure the environment is set up correctly. If you haven’t enabled the metrics endpoint, you can do so by adding the following lines to your docker-compose file.
TFE_METRICS_ENABLE: true
TFE_METRICS_HTTP_PORT: 9090
TFE_METRICS_HTTPS_PORT: 9091
ports: # append these to your existing ports section
- "9090:9090"
- "9091:9091"
Now that’s done, rebuild the container and start it back up again:
docker compose up -d
Validate that it works by curling the endpoint on the TFE server:
curl http://localhost:9090/metrics?format=prometheus
Setting up Amazon CloudWatch agent
Dependencies
The IAM role on your instance profile needs to have the following permissions to upload logs and metrics to Amazon CloudWatch:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- cloudwatch:PutMetricData
Now that we have the endpoint exposed and can confirm that we can hit it from the host, we will need to configure the scraper jobs and the CloudWatch agent:
Create a scraper job file
- In the CloudWatch agent config directory, create a Prometheus file.
cd /opt/aws/amazon-cloudwatch-agent/etc/ && mkdir prometheus
cd prometheus && vi prometheus.yml
2. Copy the code snippet below into the fileglobal:
global:
scrape_interval: 10s
scrape_timeout: 10s
scrape_configs:
- job_name: tfe
sample_limit: 10000
static_configs:
- targets:
- "localhost:9090"
scheme: http
metrics_path: '/metrics'
params:
format: ['prometheus']
Update CloudWatch agent config file
Navigate to your CloudWatch agent config file, the default location being:
- Linux:
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
- Windows:
C:\\Program Files\\Amazon\\AmazonCloudWatchAgent\\amazon-cloudwatch-agent.json
Add the following code to the config file.
If you already have a logs stanza, then simply add it appropriately:
"logs": {
"metrics_collected": {
"prometheus": {
"log_group_name": "tfe",
"prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus/prometheus.yml",
"emf_processor": {
"metric_declaration_dedup": true,
"metric_namespace": "TFE_Metrics",
"metric_unit": {
"tfe_container_cpu_usage_user": "Counter",
"tfe_container_cpu_usage_kernel": "Counter",
"tfe_container_memory_used_bytes": "Gauge",
"tfe_container_memory_limit": "Gauge",
"tfe_container_network_rx_bytes_total": "Counter",
"tfe_container_network_rx_packets_total": "Counter",
"tfe_container_network_tx_bytes_total": "Counter",
"tfe_container_network_tx_packets_total": "Counter",
"tfe_container_disk_io_op_read_total": "Counter",
"tfe_container_disk_io_op_write_total": "Counter",
"tfe_container_disk_io_bytes_read_total": "Counter",
"tfe_container_disk_io_bytes_write_total": "Counter",
"tfe_container_process_count": "Gauge",
"tfe_container_process_limit": "Gauge"
},
"metric_declaration": [
{
"source_labels": [
"job"
],
"label_matcher": "tfe",
"dimensions": [
[
"run_type",
"host"
]
],
"metric_selectors": [
"^tfe_run_count$"
]
},
{
"source_labels": [
"job"
],
"label_matcher": "^tfe$",
"dimensions": [
[
"name",
"host"
]
],
"metric_selectors": [
"^tfe_container_cpu_usage_user_ns$",
"^tfe_container_cpu_usage_kernel_ns$"
]
}
]
}
}
}
}
Let’s take a closer look at the config file above; the above configuration file will do the following:
- Create the metric namespace TFE_Metrics and upload the metrics to that namespace
- Upload 3 metrics:
tfe.run.count
– Number of running containers used for Terraform operators (runs and plans).
– Metric uploaded will utilise host and run_type dimensions to show the count forapply
andplan
tfe.container.cpu.usage.user
– Running count, in nanoseconds, of the total amount of time processes in the container have spent in userspacetfe.container.cpu.usage.kernel
– Running count, in nanoseconds, of the total amount of time processes in the container have spent in kernel space
– metric uploaded will utilise the host and name
– both kernel and user metrics will be uploaded to the same dimension view within the namespace
3. metrics will be uploaded to CloudWatch logs under the tfe log group
It’s also important to note that the metric_unit section of the config file will need to match up with the metric type you see on the documentation I mentioned below.
Now that the configuration is done, you can restart the CloudWatch agent and verify it works.
systemctl restart amazon-cloudwatch-agent
The official AWS documentation that runs through Prometheus metric gathering in greater detail can be found here.
For a full list of metrics you can consume and send to CloudWatch, check out the HashiCorp documentation here
Conclusion
In conclusion, this blog post has outlined the process of setting up the Amazon CloudWatch agent to consume metrics from the HashiCorp Terraform Enterprise (TFE) metrics service. By leveraging the capabilities of the CloudWatch agent, organizations can seamlessly monitor and analyze TFE metrics in real-time, enabling them to detect anomalies and optimize performance.
Great Tech-Spectations
Great Tech-Spectations
The Versent & AWS Great Tech-Spectations report explores how Aussies feel about tech in their everyday lives and how it measures up to expectations. Download the report now for a blueprint on how to meet consumer’s growing demands.