Is it reasonable to think the EOF error comes from Postgres? Is there a way to workaround it via configuration, something such as extending session timeout?
do you have a reverse proxy or load balancer sitting between the server and your runners? If so, perhaps sometimes logs exceed the maximum request size. The only time we have ever received similar reports of this error have been related to reverse proxies or load balancers terminating the request, hence the EOF
No, I deploy docker runners via autoscaler, and the ec2 instances are located in the same VPC subnet with the host where drone docker image runs. I interpreted the message manager: cannot update step as the EOF happened when it tries to update the information about the step in the DB based on the source code search with the error message. I was wondering if the db session has timed out and caused EOF error. For example, I see the following message in psql CLI tool when I keep the connection idle for a while.
drone_ci=> select count(stage_build_id) from stages where stage_error='EOF' group by stage_build_id;
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Succeeded.
Can I work it around if I specify something like tcp_keepalives_idle as the option of the connection to psql?