Monitoring Health for On-Premises Deployments

Contents

Introduction

Monitoring On-Premises Components

Additional Monitoring

Introduction

Overview

This article provides examples and best practices for monitoring the health of an On-Premises OverOps installation. There are many monitoring tools available and this document aims to provide general guidance for monitoring the OverOps components regardless of the tools you are using.

Refer to OverOps Compatibility Guide for all supported software.

Note: Before beginning, use the below provided information to prepare for the procedures and configuration changes described in the main section of this document.

Use Case: Monitoring the State of the OverOps Implementation for On-Premises Deployment

When OverOps is deployed on-premises, the OverOps implementation cannot be monitored. As a diligent IT Operations or DevOps Manager you want to put processes in place which help you mitigate potential issues as soon as they occur.

This best practice provides details on how to monitor:

Processes Ensure the process is running
Ports Ensure the ports are listening
Log files Monitor the log files for errors
HTTP Endpoints Ensure endpoints respond to requests with acceptable response time.

Monitoring On-Premises Components

OverOps can be installed on-premises using Docker that requires an existing installation of Docker and Docker-Compose, or with a standalone non-Docker installation that requires only a JRE. Both installation types are currently supported on Linux only. Understanding the running processes in each installation type is critical for proper health monitoring. The components listed can be monitored using a typical process watcher.

Dashboard Server Components - in Docker

The Docker installation consists of the Docker service with three running processes. Ensure the following processes are running when Docker is started either as a service or using the takipi-service.sh startup script:

Process
dockerd Docker server
containerd Docker management client
docker-proxy Docker proxy server

The following Docker containers are required running critical processes:

Container Entrypoint Command
takipi_dynalite_1 “/bin/sh -c $TAKIPI_DYNALITE_HOME/entrypoint.sh”
takipi_master_1 $TAKIPI_DYNALITE_HOME/entrypoint.sh"
takipi_queue_1 /takipi-entrypoint.sh rabbitmq-server"
takipi_storage_1 “/bin/sh -c /opt/takipi-onprem/takipi-storage/entrypoint.sh”
takipi_mysql_1 “docker-entrypoint.sh mysqld”

The takipi_master_1 maps to an external facing port listening on the Docker host. Typically, this is port 8080. Internal ports can only be monitored from the host.

Service Port Direction
Tomcat on takipi_master_1 8080 External
RabbitMQ on takipi_queue_1 4369, 5671-6672, 25672 Internal
MySQL 3306 Interal

Dashboard Server Components - in Non-Docker

The non-Docker implementation exhibits two running java processes.

Process Type Java Jar File Service and default ports
java takipi-server/lib/dynalite-java.jar OverOps Dynalite service -port 4567
java takipi-server/lib/takipi-backend.jar OverOps embedded Tomcat server -port 8080

If not connecting to an external database such as MySQL, There may be three additional running processes to support the H2 database that require monitoring.

Process Type Java Jar File Database Service and default ports
java takipi-server/lib/h2.jar stats -tcpPort 5000
java takipi-server/lib/h2.jar dynalite -tcpPort 5001
java takipi-server/lib/h2.jar qsql-h2 -tcpPort 5002

Agent/Collector Monitoring

The OverOps architecture consists of both an Agent and a Collector, but only the collector service requires monitoring. The process name for the collector service is takipi-service .

  • The Collector is a running daemon typically launched as a service at system startup.
  • Agents are started within the JVM and do not exhibit a running process. Agent counts are reflected as JVM counts and are displayed in StatsD data published by the Collector. See StatsD Metrics.

Additional Monitoring

Endpoint Monitoring

The following endpoints can be monitored for availability and response time:

URL Assertion Description
http://:8080/login.html OverOps - Login Simple assertion to determine if the application service is running and accepting responses on the Tomcat port
http://:8080/api/v1/services//views {“views”: REST API endpoint provides a list of views available from the server. Requires basic authentication

Log files

All relevant log files for the various components can be found in the following locations:

Component Location
Dashboard - Docker version //takipi-server/storage/tomcat/logs
Dashboard - Non-Docker version //takipi-server/log/. i.e. /opt/takipi-server/log/tomcat/tomcat/Catalina.log
Collector /<TAKIPI_HOME>/takipi/log/bugtale_service.log
Agent /<TAKIPI_HOME>/takipi/log/agents

StatsD Metrics

OverOps supports sending metrics to third-party graphing and monitoring tools via StatsD, an open-source protocol to capture, aggregate and send metrics to modern DevOps tools. In addition to monitoring, these metrics can be used for Anomaly Detection, Visualization, Analytics, and Telemetry.

EndPoint Description
overops_diagnostics_< HOSTNAME >_daemon-pulse Time series representation of the status of the collector 1 means up 0 or no value means down.
overops_diagnostics_< HOSTNAME >_s3-calls Time series representation on the amount of calls to the s3 service.
overops_diagnostics_< HOSTNAME >_backend-calls Time series representation on the amount of calls to the backend-service.

In the example below OverOps is monitored using InfluxDB as the StatsD database. This query allows you to create a table of all running Collectors and their current status based on whether the pulse showed up in the last time series:

SELECT mean("value")
FROM /overops_diagnostics_.*_daemon-pulse/
WHERE $timeFilter GROUP BY time(5m)

Collectors

Time Metric Value
2018-05-25 13:45:00 overops_diagnostics_ip-172-31-43-62_daemon-pulse.mean -
2018-05-25 13:45:00 overops_diagnostics_ip-10-238-54-161_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-159-154-78_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-225-180-121_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-168-83-132_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-164-178-61_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-233-123-230_daemon-pulse.mean 1.00
2018-05-25 13:45:00 overops_diagnostics_ip-10-228-106-172_daemon-pulse.mean 1.00