I have encountered a strange problem.
The drone UI,shows that there is a build record,that could not be executed.
It looks like it has not been consumed by drone-agent. However, in fact, this record was successfully consumed by the drone-agent and the build process was completed (after the build, the docker image was generated normally).
I have seen the log of drone-server and drone-agent. From the log, drone-agent, in the process of notifying the drone-server status change, passed the ID as zero.
{“time”:“2018-10-19T03:02:09Z”,“level”:“debug”,“repo”:“mbrand/service_acl”,“build”:“62”,“id”:“0”,“message”:“pipeline lease renewed”}
grpc error: extend(): code: Unknown: rpc error: code = Unknown desc = queue: task not found
The drone-agent use id 0 to comuicate with drone-server, as you can see the logs above. And then, I read the drone source code,and the following code is part of the build trigger process( https://github.com/drone/drone/blob/master/server/hook.go. func PostHook() ).
func PostHook(c *gin.Context) {
....
err = store.CreateBuild(c, build, build.Procs...)
if err != nil {
logrus.Errorf("failure to save commit for %s. %s", repo.FullName, err)
c.AbortWithError(500, err)
return
}
c.JSON(200, build)
if build.Status == model.StatusBlocked {
return
}
b := builder{
Repo: repo,
Curr: build,
Last: last,
Netrc: netrc,
Secs: secs,
Regs: regs,
Envs: envs,
Link: httputil.GetURL(c.Request),
Yaml: conf.Data,
}
items, err := b.Build()
if err != nil {
build.Status = model.StatusError
build.Started = time.Now().Unix()
build.Finished = build.Started
build.Error = err.Error()
store.UpdateBuild(c, build)
return
}
var pcounter = len(items)
for _, item := range items {
build.Procs = append(build.Procs, item.Proc)
item.Proc.BuildID = build.ID
for _, stage := range item.Config.Stages {
var gid int
for _, step := range stage.Steps {
pcounter++
if gid == 0 {
gid = pcounter
}
proc := &model.Proc{
BuildID: build.ID,
Name: step.Alias,
PID: pcounter,
PPID: item.Proc.PID,
PGID: gid,
State: model.StatusPending,
}
build.Procs = append(build.Procs, proc)
}
}
}
err = store.FromContext(c).ProcCreate(build.Procs)
if err != nil {
logrus.Errorf("error persisting procs %s/%d: %s", repo.FullName, build.Number, err)
}
From the code point of view, after the code executes
err = store.CreateBuild
, it directly informs the client that it succeeds, but in fact, the code behind
err = store.FromContext(c).ProcCreate(build.Procs)
If the execution is unsuccessful, It means that the operation of the database fails, the primary key ID of proc is 0.
As a result, the drone-agent consumes the wrong queue data in drone-server, causing drone-agent to notify drone-server of the build status with the wrong ID.
So I would like to ask, why not put the PostHook function,
c.JSON(200, build)
after
err = store.FromContext(c).ProcCreate(build.Procs)
if err != nil {
logrus.Errorf("error persisting procs %s/%d: %s", repo.FullName, build.Number, err)
}
like the code below
err = store.FromContext(c).ProcCreate(build.Procs)
if err != nil {
build.Status = model.StatusError
build.Error = err.Error()
store.UpdateBuild(c, build)
logrus.Errorf("error persisting procs %s/%d: %s", repo.FullName, build.Number, err)
c.String(500, "Failed to create procs %s/%d: %s", repo.FullName, build.Number, err)
return
}
If my thoughts are wrong, please tell me the reasons, if I am right, can I submit a pull request to the official repo?