Master node backup
-
The master nodes keep information on the Kubernetes cluster status and can perform operations on the cluster. There is no relations between the master nodes and Harness database.
-
If all master nodes are down at the same time, but worker nodes are available and all load is on them, Harness will continue to function. At the same time, no changes to the cluster will be made, meaning that failed pods will not be replaced, and scaling operations, upgrades, etc, will not be possible until at least one master node is available. if all master nodes are down, not all the changes to the cluster are possible so the backups will still be valid.
Mongo backup
All harness data are stored in mongo database and timescale database. Since timescale database doesn’t hold any source to truth (SOT) the data in it need not require back and as it can be generated any time with SOT which is Mongo database. Taking mongo database backup will be sufficient. Here is list of pods that needs to stop or scaled down before taking backup
- Manager pods
- Learning engine pods
How to take mongo backup:
- Here the script listed in the page. In order to take backup, retrieve the mongo URI from the installation and use them in mongo dump script.
- Mongo replica set will hold replica of data usually there will three replicas of mongo and all that needs to be done execute the command in the link.
- Mongo replicas are spread across 3 different master nodes and each replica is identical will be able serve the load at any given point of time.
- Please find list of steps that we perform to install/upgrade mongo (FYI reference)
Export Variable
Scale Down manager
Scale down verification service
Auth schema & protocol version upgrade
Upgrade compatibility check
Delete mongo stateful sets
Deploy new Mongo pods
Scale up manager
Scale up Verification service
Set compatibility to mongo version 4.2
Additional Questions & Answers:
- Say for example Mongo backup happens every day 12:00, What if One of master nodes is crashed after the backup.
- If one of the master nodes is crashed mongo replica set in other master nodes will be able to fill-in and there won’t be any impact.
- Say for example Mongo backup happens every day 12:00, What if All of master nodes is crashed after the backup.
- If all master nodes go down & there are no available worker nodes in which harness pods are deployed, then harness will go down.
- If all master nodes go down & there are available worker nodes in which harness pods are deployed, then harness will continue to work but harness won’t be able perform scale up or down.
- If all master nodes go down and come back up after some time & PV is intact or not deleted, harness will be able to automatically recover the data from existing persistent volumes.
- If all master nodes go down and come back up after some time & PV is intact or deleted, then delta data can’t be recovered.
Scale up & down (Memory & CPU):
- Since we use Kubernetes, any scaling operation will be done by adding replicas of existing micro services. That means that we will need more CPU and memory, and harness don’t care if this is on existing nodes or new ones. The only time that scaling out and not up will be required is:
- We want to increase HA capabilities
- The nodes we are on are maxed out, and we cannot scale-up
- How to scale up or down memory or CPU in master nodes?
- In KOTS admin, vacate the node (there is a button for it). After the node a vacated, shut it down, do the scaling up/down, bring the node back up and add back into the cluster it should work as expected, then number of replicas be adjusted as needed.
Valero Backup:
At this moment please replicated disaster recovery and backup for embedded cluster – it is in active development, NO ETA available at this moment.