Monitoring Your Harness Delegate With 24/7 Service Guard
In the field, the solution architecture team, code-named Blackbelt Team, has been seeing a common theme about monitoring delegates. Today’s post is to provide some insight on how to get started with a project like this.
For the most part, customers want to monitor the delegate to look at CPU usage, Memory Usage, to see how the delegate behaves under specific loads. If the Delegate were to get overloaded or start to delay tasks, customers want to know that ahead of time. Ultimately, it’s the visibility into the delegate’s behaviors that our customers want to see.
In the field, I’ve seen customers use Splunk to gather their Delegate Logs. In terms of CPU and Memory, I’ve seen customers use Prometheus and Wavefront to gather that information. They use either their own dashboarding to chart those metrics overtime or leverage the Harness 24/7 Service Guard.
24/7 Service Guard can show and alert customers when a delegate is exceeding a certain threshold, up to the user what that threshold is. Customers can get alerted ahead of time before the Delegate Down Notification pops up in the UI.
In this post, I will do a step by step to set up Prometheus to monitor the Delegate.
Technique to Monitor
Prometheus
- A Common Technique I see customers use is having Prometheus monitor the CPU and Memory Metrics of the container
- They wire this up in 24/7 Service Guard Based of a Specific Query
Steps to create this Prometheus Setup
Pre - Requisite:
- Configure a Prometheus Server
- Github Repo To Deploy Prometheus in Kubernetes: rohangupta5328/prometheus
- Configure a Harness Application
- Configure a Harness Environment
Actual Configuration:
Once Prometheus is Deployed and configured on the Cluster. Please go to the desired application in Harness or Create a New Application to host the Delegate Service we will be creating. We will also be configuring 24/7 in an Environment. For this example, I configured a new Application like so:
- Create a Service called ‘Delegate’
This Service is empty, and by empty it should look like this:
- Navigate to the Environment that you want to associate the Prometheus monitoring with. This will target the Delegate Service for configuration. Add 24/7 Service Guard to the Harness Application’s Environment
- Pass in the Query that targets the Harness Delegate from Prometheus
Sample Query:
avg(container_memory_usage_bytes{pod_name="blackbelt-crasher-dofbpn-0"})
avg(container_cpu_load_average_10s{pod_name="rohan-dofbpn-1"})
avg(container_cpu_usage_seconds_total{pod_name="rohan-dofbpn-1"})
Note: THIS QUERY MUST BE A WORKING QUERY!
- Once Configured, please Test out the connection to make sure Harness can connect to the Prometheus Server and execute the query. In the UI, you should see something like this:
- Hit Submit and view in 24/7 Service Guard Dashboard
The user can create an Alert Notification to be alerted when the transactions are reading a red value in the Harness 24/7 Service Guard. You can configure this portion like so:
Navigate back to the environment where the 24/7 Service Guard is configured. Click on the pencil and enable:
If we want to route this to other User Groups: Navigate to the Alert Management section and configure something similar to this:
This will alert the team that owns the delegates the status and their health.
Over some period of time:
Conclusion
Over time, you will be able to see when Delegates are being heavily utilized and can be proactive about creating Autoscaling policy, or HPA, behind the delegates. If you’re using a Kubernetes Delegate, due to the Statefulset, the delegate pod will spin back up, if the delegate container were to die due to too many tasks. It would be a good idea to scale up. With the 24/7 Service Guard monitoring, a Harness Developer or Admin can view the Delegate status and be aware of its usage and its current state. We can be preemptive and scale the delegates based on alerts and notifications from the 24/7 Service Guard. As an Admin, you would know ahead of time to reduce any chance delegate downtime.