Posted Tue, 16 Jun 2020 08:33:13 GMT by Micheal Carolissen

Hi 

Anyone know what the KeyNotFoundException means in the even viewer logs

Posted Tue, 16 Jun 2020 14:14:14 GMT by Alex Escalante

Hello Michael,

This is a message that sometimes appears when the agent is experiencing issues. Does this error coincide with any agent errors you may be encountering?

 

Posted Thu, 18 Jun 2020 06:15:59 GMT by Micheal Carolissen

Hi 

Currently what we have is a the BPA server and 3 different servers for 3 agents, and because v10 is prone to high CPU where the server will stop responding once the cpu peaks too high, we've built a process that will log in, stop triggering, drain all running tasks, stop and restart the Exec and Management services, log back in and enable triggering. The problem we've picked up is that as soon as it logs in it is not able to enable triggering at which point the exception occurs in the event viewer. A "human" mouse click has to enable the triggering. this error is only recently appearing. There isnt any noticeable errors on the agents as we normally expect errors when the agent disconnects from the server when the restart happens and reconnects again.But more concerned about why the exception is appearing when an automated process is enabling triggering but not when a human is interacting with the server manually.

Posted Thu, 18 Jun 2020 13:41:40 GMT by Alex Escalante

Hello Michael,

What is being used to accomplish this process? In other words, is a different agent logging into this agent and stopping triggering, restarting services, etc...?

Has there been a ticket for the high CPU you are experiencing? It seems if we troubleshoot the cause of that we can prevent this situation from occurring at all...

Posted Thu, 18 Jun 2020 15:18:29 GMT by Micheal Carolissen

Hi,

As we are not able to upgrade to V11, which i believe has solved the CPU issue. In v10, we have experienced issues where the management service slowly consumes RAM until it peaks at 99% or the execution service slowly consumes CPU until the server becomes unresponsive at which point a restart is required. Our concern is that the server stops processing in the early hours of the morning causing BPA not to trigger any workflows until the services are restarted. It roughly takes about 4-5 days where the CPU peaks and the server needs to restart.

I believe there has been previous complaints about high CPU usage in the past. we have experience v11 issues that prevent us from upgrading currently so we trying to solve the CPU issue by performing an automated maintenance on the BPA server.

I have requested that the global triggering be a feature where we are able to change the state of the global triggering via the REST API as this would enable us to fully automate the BPA service restart. currently the only way is to manually click the trigger button via the SMC Console.

Posted Thu, 18 Jun 2020 17:33:17 GMT by Alex Escalante

Hello Michael,

Understood. Is the "automatic process" that accomplishes this routine AutoMate? Or a 3rd party tool?

We certainly understand the inconvenience this issue is causing and understand the need to accomplish it. However this process is not something that we normally see or come across. It's possible a 3rd party process or AutoMate itself is not handling triggering the same way as a manual interaction. Again, this is not something we can say definitively as this is not a usual situation.

Is scheduling a window of downtime every 2 or 3 days where the services can be restarted be an option? Since the issue occurs every 4-5 days? This way you schedule tasks to not be running, eliminating the need to "clear tasks" and the restart will reset the memory to clear the issue. You can use a Powershell or BASIC script to restart services in the proper order, for example.

If we have not already, we do recommend creating a support ticket to gather data on why the cpu spike is growing in the first place. If the management server process is increasing, that is usually related to the SMC being left open for long periods of time, for example. This may not be your exact scenario, but we can at least cross off the things that it could be to get you a more permanent solution other than ultimately upgrading to 11.

Posted Thu, 18 Jun 2020 17:33:58 GMT by Alex Escalante

test

Posted Fri, 19 Jun 2020 06:17:14 GMT by Micheal Carolissen

Hi Alex

We have tried restarting the services every night to start off with, the biggest issue is that the the only way to enabling triggering is a manual mouse click. we noticed there is a flag in the DB for the global triggering but the service have to be restarted to take effect of the new value set in the db so when the value is set to true, the execution service must restart to notice that the value has changed, but once restarted, triggering is already enabled but the worker agents might still take a couple of moments to reconnect to the server which is the main problem as the tasks will execute and fail due to agent not connected. 

We were thinking of using the REST API to check how many workers are connected but the API shuts down when the Execution service restarts which wont allow us to use it.

I'm aware that the 3rd party tool might not replicate a real human mouse click 100% and not every occasion does the KeyNotFoundException happen, but only when it does happen do we want to know where the issue might be originating from.

Currently our SMC console sessions are set to 30 minutes, we do however have the logging output set, not sure if this causes memory or CPU spikes. 

Our setup currently are as follows: 1 BPA Server with management and execution service, 3 worker Agents each different servers. we run between 600-700 tasks ranging from every few minutes to 1 every 3 month as schedules. Most of our triggers are schedule based.

You must be signed in to post in this forum.