Coordinate rollbacks when deploying to multiple regions

As teams mature on their continuous delivery journey a scenario that comes up quite often is the need to deploy to multiple regions. With that comes the question, “How do I rollback the deployments in both regions in the event that one of them fails?”

The answer is Barriers.

Barriers

Harness has a feature called barriers which allow you to synchronize multiple workflows in a pipeline. The best way to explain how they work is with an example.

Note: This article assumes you already have a Kubernetes service setup and configured in Harness as well as an environment and cloud provider connected to a Kubernetes cluster. Although we are using Kubernetes in this example the same principles apply to any type of deployment.

Setup

We’re going to be building a workflow that will wind up looking like this.

image

Step 1: Create the workflow

We’ll start out by creating a Kubernetes rolling deployment workflow. Once it’s created, edit it to make the service, environment, and infrastructure definition template variables by clicking the T icons:

Step 2: Add the barrier

In the Wrap up phase of our workflow we’ll add the barrier. What will happen is that when the workflow gets run it will wait at this step until all other workflows that are running in parallel also hit this step before moving forward. That means that if one of our workflows fail to reach this step, all of the workflows will fail and initiate a rollback.

Click Add Step and choose Barrier

image

Configure the barrier:

image

The identifier is just an arbitrary name but it must be unique within a workflow in the event that you happen to be using multiple barriers.

Step 3: Failure test

In order to test out what happens in the event of a failure we’re going to put in a conditional step that we can trigger a failure in to test whether or not our barrier is working.

Add a Shell Script step in our Deploy phase right after our Rolling Deployment step.

image

Add a script that sleeps for 10 seconds and then exits with a non-zero exit code. this will trigger the workflow to fail.

image

Now we don’t want this to fail every time, so let’s make this step execute only when we tell it to. Let’s create a workflow variable we can use to trigger the failure.

Now that we have the variable we can set a skip condition specifically for our failure step. The default behavior will be to always skip it.

Step 4: Setup infrastructure definitions

Next up we need two separate places to deploy our application. We can simulate deploying to separate regions by just deploying to separate namespaces within our cluster. Let’s create two namespaces in our cluster:

kubectl create ns region1
kubectl create ns region2

Now in our environment let’s setup matching infrastructure definitions for these.

Region 1

Region 2

Step 5: Create the pipeline

Now it’s time to tie it all together. First let’s create a pipeline that successfully deploys to both regions so we can make sure everything works.

image

Stage 1

Stage 2

A couple of things to call out here. We are using the exact same workflow for both stages of our pipeline. The only differences are that we are choosing a separate infrastructure definition (simulating a different region), and in stage 2 we indicate that we want to run it in parallel.

Step 6: Test it

Now let’s run it.

Great! Both stages completed successfully and the Failure step was skipped.

Let’s see what happens when one of them fails. Go back and edit the second stage of the pipeline and set the skip_failure variable to false.

Let’s run it again.

Here we see the deployment to the second region failed which triggered a rollback in that region.

If we look at the first region, we see that barrier stage failed which triggered the rollback in that region as well.

image

Wrapping up

Hopefully you can now see how useful barriers are when trying to coordinate complex deployments. Other common scenarios that we see them used are coordinating database upgrades and deploying multiple services.

2 Likes