Do you mind sharing how you have accomplished the restarting of services as when the services startup, if the global triggering is enabled, and the worker agents are not connected, give or take a few seconds to a minute to connect but this causes tasks to execute and throw AgentNotConnected errors.
We have tried restarting the services every night to start off with, the biggest issue is that the the only way to enabling triggering is a manual mouse click. we noticed there is a flag in the DB for the global triggering but the service have to be restarted to take effect of the new value set in the db so when the value is set to true, the execution service must restart to notice that the value has changed, but once restarted, triggering is already enabled but the worker agents might still take a couple of moments to reconnect to the server which is the main problem as the tasks will execute and fail due to agent not connected.
We were thinking of using the REST API to check how many workers are connected but the API shuts down when the Execution service restarts which wont allow us to use it.
I'm aware that the 3rd party tool might not replicate a real human mouse click 100% and not every occasion does the KeyNotFoundException happen, but only when it does happen do we want to know where the issue might be originating from.
Currently our SMC console sessions are set to 30 minutes, we do however have the logging output set, not sure if this causes memory or CPU spikes.
Our setup currently are as follows: 1 BPA Server with management and execution service, 3 worker Agents each different servers. we run between 600-700 tasks ranging from every few minutes to 1 every 3 month as schedules. Most of our triggers are schedule based.
As we are not able to upgrade to V11, which i believe has solved the CPU issue. In v10, we have experienced issues where the management service slowly consumes RAM until it peaks at 99% or the execution service slowly consumes CPU until the server becomes unresponsive at which point a restart is required. Our concern is that the server stops processing in the early hours of the morning causing BPA not to trigger any workflows until the services are restarted. It roughly takes about 4-5 days where the CPU peaks and the server needs to restart.
I believe there has been previous complaints about high CPU usage in the past. we have experience v11 issues that prevent us from upgrading currently so we trying to solve the CPU issue by performing an automated maintenance on the BPA server.
I have requested that the global triggering be a feature where we are able to change the state of the global triggering via the REST API as this would enable us to fully automate the BPA service restart. currently the only way is to manually click the trigger button via the SMC Console.
Currently what we have is a the BPA server and 3 different servers for 3 agents, and because v10 is prone to high CPU where the server will stop responding once the cpu peaks too high, we've built a process that will log in, stop triggering, drain all running tasks, stop and restart the Exec and Management services, log back in and enable triggering. The problem we've picked up is that as soon as it logs in it is not able to enable triggering at which point the exception occurs in the event viewer. A "human" mouse click has to enable the triggering. this error is only recently appearing. There isnt any noticeable errors on the agents as we normally expect errors when the agent disconnects from the server when the restart happens and reconnects again.But more concerned about why the exception is appearing when an automated process is enabling triggering but not when a human is interacting with the server manually.
The forum says that you've replied however i cannot see your response. I've also noticed another user mentioned the last reply on the thread does not show, not sure if this is the case as well.
Anyone knows what the KeyNotFoundException means in the EventViewer logs?
Anyone know what the KeyNotFoundException means in the even viewer logs
Basically when I log in to the SMC console, there is about 20 workflows/tasks currently executing, then I'll turn triggering off which should drain the currently running jobs, but instead new workflows/tasks start up and the API confirms that triggering is still on even though the toggle on the SMC says its been turned off.
Like before this only happens on the odd occasion.
Hi Alex ,
Yes this is via the SMC UI. It appears to be random on occasions whereby we would have to turn triggering back on and back off to effectively stop tasks from triggering.
I've noticed on certain occasions that when I turn off global triggering on the SMC the tasks still start executing, and the api confirms that global triggering is still enabled. I'd have to turn global triggering back on and off again to verify that it is indeed off.
Has anyone noticed that as well. I'm currently using V10.0.7 (I cannot use V11 as there are too many bugs to upgrade).