This article applies to BoKS 7.0.0

Description

This hotfix addresses multiple issues related to the operation of the BoKS servc receive bridge.

1) The receive bridges locked a semaphore each time they needed a nodekey for a remote host. On the Master and Replicas this led to a lot of semaphore lock/unlock operations for the servc receive bridges and with very many Server Agents connected this could in extreme cases lead to a situation where Server Agents timed out and connected again, causing the number of receive bridges to increase even more.

2) There was no way to limit the number of simultaneous connections from Server Agents.

3) Access could fail intermittently when a Replica was overloaded.


Resolution / Workaround

To resolve this issue, apply hotfix HFBM-0220, available for download from the HelpSystems Community Portal.

1) The receive bridge now copies the nodekey to local memory when it first loads it, so it only needs to lock the semaphore once.

2) It is now possible to limit the number of simultaneous connections from Server Agents to servc using the ENV variable BRIDGE_SERVC_R_MAX_CONN. The default value is 9950.

3) The problem was that when a Replica was overloaded, the bridge sent an error back that the bridge on the Server Agent did not handle properly and propagated back to the application making the call. Now it sends another error back that makes the bridge on the Server Agent close the connection and try to find another server to send the request to.

This hotfix also includes a speedup on some platforms for the boks_udsqd process that acts as a queue in front of servc. Normally it uses poll(), but on Linux it now uses epoll(), and on Solaris it uses event ports. This hotfix also includes the work done for improved Replica load balancing in HFBM-0192, thus on Master/Replica platforms this hotfix supersedes hotfix HFBM-0192. The changes to load balancing are described below.

A time limit is added to Server Agents' connections to the Replicas to make it possible to react faster to varying Replica load. The default idle timeout is also reduced from 30 seconds to 10 seconds.

Replicas receiving the probe message make an estimate of how long an arriving request needs to wait in queue before being processed based on the average queue time for the last 10 messages multiplied by the current queue length. The estimated queue time is then included in the probe message reply.

The Server Agent sending out probe messages waits for multiple probe replies to be received and then makes a weighted random selection with weights inversely proportional to the estimated queue time received from each Replica. This way the load balancing algorithm is in effect even at low load levels.

To take full advantage of the load balancing part of this hotfix it should be installed on Master/Replicas and HFBM-0191 (BoKS 6.7) or HFBM-192 (BoKS 7.0) should be installed on Server Agents. Server Agents that do not have the hotfix installed will use the old load balancing algorithm.


Still have questions? We can help. Submit a case to Technical Support.

Last Modified On: May 25, 2018