This article describes a number of best practices for managing system fault tolerance when implementing BoKS in your organization.
System fault tolerance
As BoKS is a single-master server design, you must always be prepared and have a plan in place for any potential outage of the BoKS Master.
The system will continue to be able to serve BoKS Server Agents through the Replicas if the Master goes down, thus ensuring system availability, but a number of things in the system will cease to function:
- Database replication
- System administration (including updates of e.g. user accounts to Server Agents)
- Updates to the database, including password updates
- Audit log consolidation
- AD bridge (synchronization from Active Directory)
- LDAP synchronization
Even though promoting any Replica to become the Master is a straight-forward process, the Master may have configuration data defining its behavior and that may be crucial to proper behavior of the system.
Recommended best practices:
- Make sure you have system monitoring in place to ensure that any faults are discovered as quickly as possible.
- Have procedures in place that allow you to determine in what circumstances you:
a) try to fix any issues on the Master without converting the failover Replica to Master, and
b) convert the failover Replica to Master.
- Designate a Replica to be the new Master in case of a longer outage of the current Master - this is primarily a question of keeping certain configuration files in sync with the regular BoKS Master. For details, see article #11721 Reference: Configuring a Replica for Failover.
- During system maintenance on the BoKS Master (e.g. OS patching etc), either ensure that there are enough Replicas running or communicate an outage to end users. If you have a good estimate of the Master downtime required, you can determine the need whether to convert a failover Replica to the Master based on predefined procedures.
- In the event of Replica outage, ensure that the remaining Replica servers can handle load.
- Network topology is important to consider. It is generally recommended that Replicas be paired in data centers. With one BoKS server in a data center, during outage all traffic goes over WAN. It is better to have 2 or more BoKS servers, and it is recommended to ensure that at least one is up when doing system maintenance.