Collector / Storage Optimizations

Version

Available with Version 4.12.10 and above.

Introduction

OverOps captures data when application errors (exceptions) and log (warning and error) events occur. This data is called a snapshot, and consists of code, variable state, log statements and SVM state.

For efficient data collection, OverOps collects data according to an internal algorithm.

After 24-hours of activity, OverOps takes one snapshot per unique event per day for each application. For 100 instances of an application, there will be 100 snapshots per unique event a day. This can quickly fill up the OverOps Storage Server especially if there are a large number of unique events per day (For example: for 100 application servers throwing 1800 unique events each means 180K snapshots a day).

OverOps can throttle the number of snapshots, reducing the storage and processing time. Additionally, OverOps features an automatic cleanup process to delete old and irrelevant snapshots from the Storage Server.

This document describes how to configure the optimizations settings based on the environment.

Collector Optimizations

OverOps Collectors can throttle snapshots to reduce overall storage and optimize JVM performance. See diagram below for a clear understanding of the properties:

To enable throttling:

  1. From the directory, open the collector.properties file and add the following parameters:

throttlerEnabled = true
Turns throttling on or off

throttlerTimeWindowMillis = 86400000 (= 1 day)
Window of time the throttling is applied to) (example: for each day OverOps applies the rules below

throttlerNewWindowCount = 2
The number of “new” window periods to use

throttlerMaxNewEvents = 10
limit the number of snapshots per unique event for the “new” window periods

throttlerMaxUniqueEvents = 10
After the “new” windowing periods, this is the total allowable snapshots per unique event per TimeWindow

throttlerMaxTotalEvents = 0
0 = unlimited , the total allowable snapshots for all events, unlimited is the best setting here

throttlerIgnoreMachines = false
True means the number of snapshots per unique event is calculated for all machines together.
False means the number of snapshots per unique event is calculated by each machine independently.)

throttlerIgnoreApplications = false
True means the number of snapshots per unique event is calculated for all agents together. False means the number of snapshots per unique event is calculated by each agent independently.

throttlerIgnoreDeployments = false
True means the number of snapshots per unique event is calculated for the application regardless of deployment version together (if you deployed a new version of the code, OverOps would view it as the same application and throttle accordingly).
False means the number of snapshots per unique event is calculated by each application by deployment version independently (if you deployed a new version of the code, OverOps would reset the snapshots as this is new and the throttling starts over).

  1. When finished, restart the Collector.

Hybrid Storage Server Optimizations (version 2.0 or above)

Note : Applicable for Hybrid deployments running Storage Server 2.0 or above.

OverOps enables users in Hybrid deployments to free up space by running a periodic cleanup task to remove events older than the defined period from the Storage Server. By default, cleanup is disabled.

To enable cleanup:

  1. From the <HYBRID_STORAGE> directory, open the settings.yml file and set the following parameters:
    cleanupJobEnabled: true

  2. Set the cleanup interval (default is 6 hours):
    jobs: cleanup: 24h Examples for time units:
    15h - 15 hours
    5d - 5 days

  3. Set retention period: The number of days after which events last seen are to be deleted (default is 92).
    retentionPeriodDays: 30

  4. Define the system health threshold, between 0-1, (default is 0.90 =90%):
    maxUsedStoragePercentage: 0.90

  5. Save and close the settings.ym l file and then restart the Storage Server.

The logs will show the status of each cleanup job.

Note : When the Collector is restarted for whatever reason the counters reset.