This document walks you through the steps to follow on a Harness NextGen to identify Free-Log-Service Failures and How to troubleshoot the same.
1. This issue may arise because of the following reasons:
a. The communication from the delegate agent to build pod was not happening.
b. Communication to the delegate agent from the build pod is not happening.
2. Use of Feature Flag “CI_INDIRECT_LOG_UPLOAD”
a. If Feature Flag “CI_INDIRECT_LOG_UPLOAD” is enabled on your account, then build farm directly talk to harness log service http://app.harness.io/log-service
Note: “CI_INDIRECT_LOG_UPLOAD” FF takes 60 minutes to get activated in the account.
b. If Feature Flag “CI_INDIRECT_LOG_UPLOAD” is not enabled on your account, then you can ask the Support team to enable it for your account and re-run the pipeline, or if it is not enabled, then build farm will not be able to talk to harness log service directly.
log_key:\\\"accountId:XXXXXXXXXXXXXXX/orgId:XXX/projectId:XXXXXX/pipelineId:test/runSequence:XXXX/level0:pipeline/level1:stages/level2:build/level3:spec/level4:execution/level5:steps/level6:alpine\\\" account_id:\\\"XXXXXXXXXXXXX\\\" container_port:20002\",\"step_id\":\"alpine\"}}"}
time="2022-02-23T05:45:52Z" level=warning msg="http: request error. Retrying ..." error="Put \"https://storage.googleapis.com/free-log-service/XXXXXXXXXXXXX/accountId%XXXXXXXXXXX/orgId%3ANEF/projectId%3Adavidtest/pipelineId%3Atest/runSequence%3A9/level0%3Apipeline/level1%3Astages/level2%3Abuild/level3%3Aspec/level4%3Aexecution/level5%3Asteps/level6%3Aalpine?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=GOOG1EIRML2LPVSNQ5SOOPFC6JJJOGVFCJ34I55YBVUBKT3UL7IVPDNLEOZJA%2F20220223%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220223T054341Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=c93e04539ed0de8d333901e52bf0ea17b278ff32efdadac39f92089d2adc53d7\":Forbidden" path=PUT
3. How to resolve the Free-Log-Service Failures
a. First we need to verify is there any communication issue observed between from delegate agent to the build pod. If it is not happening then it will be because of the security group.
b. If you are using EKS Cluster then a security group needs to be added to each build pod.
c. Then We need to verify communication to the delegate agent from the build pod. If it is not happening or not we can get it verified by doing grpcurl from the build pod.
d. If it is failing then it will be because of an issue with the security group of the ingress controller where we need to allow incoming connections on port 8080 since the build pod talks to the delegate service via its cluster IP.