Agents not connecting to collector due to open file or process limits on the server

Problem:

Agent(s) fails to connect to the collector(s). There may also be issues with logs not being created on the collector.

If collector logs are available, may find similar log lines listed as:

ProcessImpl::readStatFileValue error opening file: <FILENAME>: Too many open files. Error number: 24 ASP/wAs: Proto persistence error (<FILENAME>) ALGR: Error opening <FILENAME> Error adding snapshot to batch (agent PID: <REFERENCE>).

Environment:

Agent - All versions

Collector - All versions

SaaS, hybrid, and on-prem environments.

Resolution:

Set the limit for number of open files to 32768 in the SystemD startup script under the [Services] section for your collector(s) using the following arguments:

LimitNOFILE=32768

The file can normally be found in the takipi-server.service file under the /etc/systemd/system/multi-user.target.wants directory.

Cause:

The operating system will calculate its default limit value of the number of open files based on its available resources such as memory and/or cpu. This value may be set too low for the collector and as a result will not allow you to have more than the adequate number of threads available to have all of its agents connect to it. One agent alone can have multiple threads open to complete the tasks it needs to send all of its data to the collector.

To circumvent this, adding the argument will bypass the system’s default value and set it to a number greater than what will be used for all agents to be able to connect to the collector.

Note: If there are still agents not connecting to the collector due to the collector reaching this limit, then set the ulimit to double what was recommended above ( LimitNOFILE=65536 ).