Let’s start with How Harness Delegates do the Artifact Collection?
- Each delegate has max 40 threads that can work on artifact collection.
- Each artifact source creates a collection task every 2 minutes.
- Each task uses one of those threads for the duration of the task.
- If we assume the average collection takes 10 seconds, and that the tasks are evenly distributed, then each delegate can perform 40 * 6 = 240 collections per minute.
If that collection duration estimate is drastically wrong however, then that number could be quite different.
If for example eight delegates are able to perform the collection tasks then there is a theoretical maximum of 240 * 8 = 1920 artifact sources that could be handled by 8 delegates. If not all eight are scoped to be able to perform the collection tasks then adjust the multiplier accordingly.
Now we know how delegates do the collection so let’s see how does the artifact collection process work.
The background job for artifact collection runs every min and fetches the new build available, this should not take more than 2-3 mins.
At times the delay in artifact collection happens because of the failed jobs which could be due to multiple reasons.
Also if we would have disabled artifact collection in case too many failed attemps (3501 attemps) to collect the artifact.
Below is an example of the logs we do in manager logs in that case for that particular artifact stream :
ASYNC_ARTIFACT_CRON: Artifact collection disabled for artifactStream due to too many failures, type: ECR, id: xxxxxxxxxxxxxxx, failed count: 3501
We do usually see delays of 15-20 mins or say an hour sometimes.
I shall share an example from the logs :
2021-02-11 17:58:00,992 e[32m[BuildSourceCallbackExecutor-0]e[0;39m e[31mWARN e[0;39m e[36msoftware.wings.delegatetasks.buildsource.BuildSourceCallbacke[0;39m - ASYNC_ARTIFACT_COLLECTION: successfully fetched builds after [8] failures for artifactStream[xxxxxxxxxxxxxxxxx]
Now this is for artifactstreamID : xxxxxxxxxxxxxxxx
If you see above it says "successfully fetched builds after [8] failures"
.
Every failure does add a delay in fetching the new version.
For example :
There is an artifact stream “NexusTest”
Failed cron attempts adds a delay in between each iteration :
2 2 4 6 10 20 mins
and it recycles there after
so if 1 fail : next would be in 2 mins
if 2 fail : next would be 4min.
Also once its successfully then the “failedCronAttempts” will be reset to 0.
You can query your DB to check this count and other details as well.
db.getCollection(‘artifactStream’).find({serviceId:“Service _ID here”})
`
Note : This DB query can be run by On-prem customers and for Saas Customer Harness can run this for them and share the output.
`
You will see response similar to below example :
{ "_id" : "xxxxxxxxxxxxxxxxxxx", "className" : "software.wings.beans.artifact.NexusArtifactStream", "jobname" : "abc-test", "groupId" : "test-test", "imageName" : "test-test", "artifactPaths" : [ "test-nexus" ], "repositoryFormat" : "maven", "artifactStreamType" : "NEXUS", "sourceName" : "test-test/test-test/test-nexus", "settingId" : "xxxxxxxxxxxxx", "name" : "nexustest", "autoPopulate" : false, "serviceId" : "xxxxxxxxxxxx", "metadataOnly" : true, "failedCronAttempts" : 8, "nextIteration" : NumberLong("1614261680347"), "nextCleanupIteration" : NumberLong("1614262841904"), "accountId" : "xxxxxxxxxxxxxxxx", "keywords" : [ "nexus", "test/test-test/test-nexus", "nexustest" ], "sample" : false, "collectionStatus" : "STABLE", "artifactStreamParameterized" : false, "appId" : "xxxxxxxxxxxxxxxxx", "createdAt" : NumberLong("1613678222789"), "lastUpdatedAt" : NumberLong("1614261500333") }