Problems with Yaml file being cutoff

Apparently I’ve got to use discourse for that, which is really annoying.
It is about this issue:
https://github.com/drone/drone/issues/2523
To make this reproducible, I created a demo repository:
https://gitlab.com/boredland/nodejs-api-example
And a public and enabled pipeline:
https://drone.onpremise.testsolutions.de/boredland/nodejs-api-example/1

What can I do to track this down?

To clarify:

  1. This branch starts a pipeline, but fails with the “insta” problem:
    https://gitlab.com/boredland/nodejs-api-example/tree/test_demo_node_pipeline
  2. This branch doesn’t start at all:
    https://gitlab.com/boredland/nodejs-api-example/tree/feature/migrate_to_drone

I have the same issue using 1.0.0-rc.5 with kubernetes integration. I’ve noticed it seems to be caused by “>” characters, for example,

kind: pipeline
name: default

steps:
- name: foo
  image: ubuntu:18.04
  commands:
  - apt update
  - apt install -y curl > /dev/null
  - curl icanhazip.com

results in “apt install -y cur”.

I was able to get around it with

- sh -c "apt update > /dev/null 2>&1"

but then that results in “cannot resolve host icanhazip.co” (removing the last letter from the next step).

I will walk through some steps to help debug this issue.

First write the yaml to disk:

cat <<EOF > .drone.yml
kind: pipeline
name: default

steps:
- name: foo
  image: ubuntu:18.04
  commands:
  - apt update
  - apt install -y curl > /dev/null
  - curl icanhazip.com
EOF

Second use the drone yaml command line tools to compile the yaml configuration file to the intermediate representation:

drone-yaml compile .drone.yml > .drone.json

The commands are converted to a shell script and base64 encoded in the output. If you inspect the json output you will find a data attribute with a long, base64 encoded string. To verify the script was properly generated we execute the following commands to find and decode the string:

cat .drone.json
echo CmlmIFsgLW4gIiRDSV9... | base64 --decode

We can see the script was correctly generated:

set -e

echo + "apt update"
apt update

echo + "apt install -y curl > /dev/null"
apt install -y curl > /dev/null

echo + "curl icanhazip.com"
curl icanhazip.com

We can then use the drone kubernetes command line tools to ensure the config map, which stores the script, is properly created:

drone-runtime --kube-debug .drone.json

The above command can be used to view the kubernetes resources that are created at runtime. We can see the script is included in the config map, and is not malformed:

---
data:
  28n8a98jtmn7djsn0xe1znunvrkip3iq: |2+

    set -e

    echo + "apt update"
    apt update

    echo + "apt install -y curl > /dev/null"
    apt install -y curl > /dev/null

    echo + "curl icanhazip.com"
    curl icanhazip.com

kind: ConfigMap
metadata:
  creationTimestamp: null
  name: 28n8a98jtmn7djsn0xe1znunvrkip3iq
  namespace: x7v05lzlt66j091ibmzzmsg9g8rvzcm4

Using the above tools we can see that everything is created properly and there is no indication that Drone is generating malformed scripts or truncating characters.

Unfortunately I am not sure there is much more I can do at this point. I was unable to reproduce the problem, and this issue seems to be very isolated and has not impacted the majority of installs.

In terms of next steps, I would advocate that someone who is able to reproduce the issue look into the code and research further. If such research identifies the root cause of this problem resides in the Drone codebase, I would be happy to either accept or collaborate on a patch. https://github.com/drone/drone-runtime

Yes my first guess when encountering this issue is that is probably related to my developer environment text editor or something like that. Obviously you would be pretty confident yaml parsing works…
In any case hopefully we can figure out workaround/cause and just leave a FAQ for others.

I played multiple times with files either cat’ed in, edited from neovim, or edited from gitlab web ide. All have the problem. At the same time, with drone-yaml and drone-runtime I obtain correct results.

I added a bit of sleep to my pipeline so I could check the config map created, for example for the source (at some point I thought I could interjecting steps wtff that fix previous bad lines lol):

With

---
kind: pipeline
name: default

steps:
- name: bar
  image: ubuntu:18.04
  commands:
  - sleep 120
  - apt update > /dev/null 2>&1
  - echo wtff
  - apt install -y curl > /dev/null 2>&1
  - echo wtff
  - curl -s icanhazip.com
  - echo foo

produces

"data": {
        "ywmfsxd2cfcfyuyi6r9ik72a02dh260u": "\nif [ -n \"$CI_NETRC_MACHINE\" ]; then\ncat \u003c\u003cEOF \u003e $HOME/.netrc\nmachine $CI_NETRC_MACHINE\nlogin $CI_NETRC_USERNAME\npassword $CI_NETRC_PASSWORD\nEOF\nchmod 0600 $HOME/.netrc\nfi\nunset CI_NETRC_USERNAME\nunset CI_NETRC_PASSWORD\nunset DRONE_NETRC_USERNAME\nunset DRONE_NETRC_PASSWORD\nset -e\n\necho + \"sleep 120\"\nsleep 120\n\necho + \"apt updat\"\napt updat\n\n"
    },

Which ends with “apt updat”.

Another CM I grabbed looked like:

With

---
kind: pipeline
name: default

steps:
- name: bar
  image: ubuntu:18.04
  commands:
  - apt update > /dev/null 2>&1
  - echo wtff
  - apt install -y curl > /dev/null 2>&1
  - echo wtff
  - curl -s icanhazip.com
  - echo foo
 "data": {
        "zhpe0yauxodxdct2rgymk8ycofn1o6mf": "\nif [ -n \"$CI_NETRC_MACHINE\" ]; then\ncat \u003c\u003cEOF \u003e $HOME/.netrc\nmachine $CI_NETRC_MACHINE\nlogin $CI_NETRC_USERNAME\npassword $CI_NETRC_PASSWORD\nEOF\nchmod 0600 $HOME/.netrc\nfi\nunset CI_NETRC_USERNAME\nunset CI_NETRC_PASSWORD\nunset DRONE_NETRC_USERNAME\nunset DRONE_NETRC_PASSWORD\nset -e\n\necho + \"apt update \u003e /dev/null 2\u003e\u00261\"\napt update \u003e /dev/null 2\u003e\u00261\n\necho + \"echo wtff\"\necho wtff\n\necho + \"apt install -y curl \u003e /dev/null\"\napt install -y curl \u003e /dev/null\n\n"
    },

With

---
kind: pipeline
name: default

steps:
- name: bar
  image: ubuntu:18.04
  commands:
  - sleep 120
  - "apt update > /dev/null 2>&1"
  - echo wtff
  - "apt install -y curl > /dev/null 2>&1"
  - echo wtff
  - curl -s icanhazip.com
  - echo foo

get

 "data": {
        "0woesptn9e1011nff28wdtecm4ihlt6j": "\nif [ -n \"$CI_NETRC_MACHINE\" ]; then\ncat \u003c\u003cEOF \u003e $HOME/.netrc\nmachine $CI_NETRC_MACHINE\nlogin $CI_NETRC_USERNAME\npassword $CI_NETRC_PASSWORD\nEOF\nchmod 0600 $HOME/.netrc\nfi\nunset CI_NETRC_USERNAME\nunset CI_NETRC_PASSWORD\nunset DRONE_NETRC_USERNAME\nunset DRONE_NETRC_PASSWORD\nset -e\n\necho + \"sleep 120\"\nsleep 120\n\necho + \"apt update \u003e /dev/null 2\u003e\u00261\"\napt update \u003e /dev/null 2\u003e\u00261\n\necho + \"echo wtff\"\necho wtff\n\necho + \"apt install -y curl \u003e /dev/null 2\u003e\u00261\"\napt install -y curl \u003e /dev/null 2\u003e\u00261\n\necho + \"echo wtff\"\necho wtff\n\necho + \"curl -s icanhazip.com\"\ncurl -s icanhazip.com\n\necho + \"echo fo\"\necho fo\n\n"
    },

Which is functional, just drops last character of last step.

Adding the


...

at the end removes the last step having its last character removed.

Looks like if I follow perfect yaml format, it works:

---
kind: pipeline
name: default

steps:
- name: bar
  image: ubuntu:18.04
  commands:
  - sleep 120
  - "apt update > /dev/null 2>&1"
  - echo wtff
  - "apt install -y curl > /dev/null 2>&1"
  - echo wtff
  - curl -s icanhazip.com
  - echo foo
  
...

Does Drone internally run drone-yaml fmt first?

Seems like problem is centered around incorrect yaml parsing when not using “—” + “…” and not wrapping steps with special characters (> is special?) with ".

Edit: now getting a bunch of unexpected end of stream problems when trying to write yaml with ... at the end. Seems finicky to get accepted. With unexpected end of stream error I can remove " from quoted steps and it works but then goes back to original problem. For some reason that pipeline above with interspersed wtf’s works.

I’m happy to continue digging further. Probably will make more sense to wait for 1.0.0 code to be released though so I can develop against an exact release. For now will just stick to 0.8.x. Can try to prepare a terraform config for launching gitlab+drone on GKE with the problem. Seems like an extremely random problem, and the fact that drone-yaml and drone-runtime work on my local machine is curious.

How does Done get the .drone.yml? I could test it on 3 systems. All have the same error if the repo is on Gitlab. However, if the repo is on GitHub, the error does not exist. I suppose it’s the GitLab implementation.

This is our code for downloading a file from GitLab using the GitLab API https://github.com/drone/go-scm/blob/master/scm/driver/gitlab/content.go#L21

I found the problem.

https://docs.gitlab.com/ee/api/repository_files.html

  1. The request in go-scm/content.go at master · drone/go-scm · GitHub is wrong. … in the test. The parameter repo must be an ID and not “diaspora/diaspora”.

GET /projects/:id/repository/files/:file_path

  1. I have tested the request in Paw. It works with the id.

  2. When I debug, I realized that the file is still being cutted.

The error is in line 27, because “out.Content” still has the full string.
Before calling base64.RawURLEncoding.DecodeString(out.Content), the string is not yet truncated.
Just change the line 26 to raw, _ := base64.StdEncoding.DecodeString(out.Content) and it works <3

h t t p s : / / i m g u r . c o m / h0Av2Ow

You can test it here: /api/v4/projects/10732418/repository/files/.drone.yml

Can you please put this into the next rc, we would like to continue :smile:

EDIT: The problem is that the RawURLEncoding function works only up to a + sign. The + sign can be found in the Base64.

PS: If you remove the tilde ~ from the .drone.yml file, it will work for now.

@NullP0interEx excellent sleuthing. I patched the go-scm library and will include the update library in the next release. thanks!

1 Like

I can confirm this is fixed in the drone/drone:latest docker image. thanks for the fix @NullP0interEx!

1 Like

On 1.0.0-rc.6 and 1.0.0 this problem re-emerged, last line of .drone.yml was

- docker push $IMAGE

and failed due to “docker push $IMA”.

But I was able to fix it by using more verbose yaml ending with “…”.