Contents
Monitoring On-Premises Components
Introduction
Overview
This article provides examples and best practices for monitoring the health of an On-Premises OverOps installation. There are many monitoring tools available and this document aims to provide general guidance for monitoring the OverOps components regardless of the tools you are using.
Refer to OverOps Compatibility Guide for all supported software.
Note: Before beginning, use the below provided information to prepare for the procedures and configuration changes described in the main section of this document.
Use Case: Monitoring the State of the OverOps Implementation for On-Premises Deployment
When OverOps is deployed on-premises, the OverOps implementation cannot be monitored. As a diligent IT Operations or DevOps Manager you want to put processes in place which help you mitigate potential issues as soon as they occur.
This best practice provides details on how to monitor:
Processes | Ensure the process is running |
---|---|
Ports | Ensure the ports are listening |
Log files | Monitor the log files for errors |
HTTP Endpoints | Ensure endpoints respond to requests with acceptable response time. |
Monitoring On-Premises Components
OverOps can be installed on-premises using Docker that requires an existing installation of Docker and Docker-Compose, or with a standalone non-Docker installation that requires only a JRE. Both installation types are currently supported on Linux only. Understanding the running processes in each installation type is critical for proper health monitoring. The components listed can be monitored using a typical process watcher.
Dashboard Server Components - in Docker
The Docker installation consists of the Docker service with three running processes. Ensure the following processes are running when Docker is started either as a service or using the takipi-service.sh startup script:
Process | |
---|---|
dockerd | Docker server |
containerd | Docker management client |
docker-proxy | Docker proxy server |
The following Docker containers are required running critical processes:
Container | Entrypoint Command |
---|---|
takipi_dynalite_1 | “/bin/sh -c $TAKIPI_DYNALITE_HOME/entrypoint.sh” |
takipi_master_1 | $TAKIPI_DYNALITE_HOME/entrypoint.sh" |
takipi_queue_1 | /takipi-entrypoint.sh rabbitmq-server" |
takipi_storage_1 | “/bin/sh -c /opt/takipi-onprem/takipi-storage/entrypoint.sh” |
takipi_mysql_1 | “docker-entrypoint.sh mysqld” |
The takipi_master_1 maps to an external facing port listening on the Docker host. Typically, this is port 8080. Internal ports can only be monitored from the host.
Service | Port | Direction |
---|---|---|
Tomcat on takipi_master_1 | 8080 | External |
RabbitMQ on takipi_queue_1 | 4369, 5671-6672, 25672 | Internal |
MySQL | 3306 | Interal |
Dashboard Server Components - in Non-Docker
The non-Docker implementation exhibits two running java processes.
Process Type | Java Jar File | Service and default ports |
---|---|---|
java | takipi-server/lib/dynalite-java.jar | OverOps Dynalite service -port 4567 |
java | takipi-server/lib/takipi-backend.jar | OverOps embedded Tomcat server -port 8080 |
If not connecting to an external database such as MySQL, There may be three additional running processes to support the H2 database that require monitoring.
Process Type | Java Jar File | Database Service and default ports |
---|---|---|
java | takipi-server/lib/h2.jar | stats -tcpPort 5000 |
java | takipi-server/lib/h2.jar | dynalite -tcpPort 5001 |
java | takipi-server/lib/h2.jar | qsql-h2 -tcpPort 5002 |
Agent/Collector Monitoring
The OverOps architecture consists of both an Agent and a Collector, but only the collector service requires monitoring. The process name for the collector service is takipi-service .
- The Collector is a running daemon typically launched as a service at system startup.
- Agents are started within the JVM and do not exhibit a running process. Agent counts are reflected as JVM counts and are displayed in StatsD data published by the Collector. See StatsD Metrics.
Additional Monitoring
Endpoint Monitoring
The following endpoints can be monitored for availability and response time:
URL | Assertion | Description |
---|---|---|
http://:8080/login.html | OverOps - Login | Simple assertion to determine if the application service is running and accepting responses on the Tomcat port |
http://:8080/api/v1/services//views | {“views”: | REST API endpoint provides a list of views available from the server. Requires basic authentication |
Log files
All relevant log files for the various components can be found in the following locations:
Component | Location |
---|---|
Dashboard - Docker version | //takipi-server/storage/tomcat/logs |
Dashboard - Non-Docker version | //takipi-server/log/. i.e. /opt/takipi-server/log/tomcat/tomcat/Catalina.log |
Collector | /<TAKIPI_HOME>/takipi/log/bugtale_service.log |
Agent | /<TAKIPI_HOME>/takipi/log/agents |
StatsD Metrics
OverOps supports sending metrics to third-party graphing and monitoring tools via StatsD, an open-source protocol to capture, aggregate and send metrics to modern DevOps tools. In addition to monitoring, these metrics can be used for Anomaly Detection, Visualization, Analytics, and Telemetry.
EndPoint | Description |
---|---|
overops_diagnostics_< HOSTNAME >_daemon-pulse | Time series representation of the status of the collector 1 means up 0 or no value means down. |
overops_diagnostics_< HOSTNAME >_s3-calls | Time series representation on the amount of calls to the s3 service. |
overops_diagnostics_< HOSTNAME >_backend-calls | Time series representation on the amount of calls to the backend-service. |
In the example below OverOps is monitored using InfluxDB as the StatsD database. This query allows you to create a table of all running Collectors and their current status based on whether the pulse showed up in the last time series:
SELECT mean("value")
FROM /overops_diagnostics_.*_daemon-pulse/
WHERE $timeFilter GROUP BY time(5m)
Collectors
Time | Metric | Value |
---|---|---|
2018-05-25 13:45:00 | overops_diagnostics_ip-172-31-43-62_daemon-pulse.mean | - |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-238-54-161_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-159-154-78_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-225-180-121_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-168-83-132_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-164-178-61_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-233-123-230_daemon-pulse.mean | 1.00 |
2018-05-25 13:45:00 | overops_diagnostics_ip-10-228-106-172_daemon-pulse.mean | 1.00 |