Management wants business processes to operate consistently, perfectly, and on time. Service Level Agreements (SLA) were developed to insure that this happens. These agreements govern the lives of many IT professionals who must monitor, evaluate, control, watch over, and check off on processes to avoid consequences. For example, an SLA might specify that failing to meet certain service levels could result in fines, loss of business, negative performance reviews, or even job termination. This pressure can overwhelm people to the point that it affects their lives away from work. But, these agreements appear to be here to stay.
Several factors contributed to the rise of the SLA:
So, if SLAs are here to stay, how can you handle them with the least amount of stress? The answer is automation.
In an IBM i-centric data center, Robot Schedule is the heart of any SLA program. It can work in conjunction with Robot Schedule Enterprise to control processes in a consistent manner on IBM i, UNIX, and Windows servers. And, it provides the cornerstone for ensuring that these processes run on time.
Checks and balances are important in any schedule. You can’t just create a batch stream of jobs and hope to meet your SLA. As you build your schedule, you must incorporate mechanisms to check the status of critical processes. Robot Schedule, Robot Console, and Robot Alert all work together to help you do this.
Robot Schedule has three job monitors you can define for each critical job on your system. Depending on the requirements of your SLAs, you can monitor for jobs that run too long (job overruns), complete too quickly (job underruns), and jobs that start later than their scheduled time (late starts). Each monitor has several options for notification of the potential problem. You can set monitor thresholds that allow enough time to solve the problem before your SLAs are in danger of not being met.
The Job Overrun job monitor allows you to specify the actions Robot Schedule should take if a job runs longer than it should. You can specify either a maximum time (in hours and minutes) the job should take to complete, or a time by which the job should finish (the choice depends on your SLA requirements):
Maximum duration:
Select Maximum duration to monitor a job based on how long it takes to complete. Enter the maximum time (in hours and minutes) the job can take to complete.
or,
Must complete by:
Select Must complete by to monitor a job based on a time by which the job should finish. Enter the time by which the job should have completed.
Actions:
Specify the actions Robot Schedule should take if the job does not complete in the time allowed. You can select to end the job, or send a warning to one, or any combination, of the following: the job’s message queue, a Robot Alert device, or the Robot Network Status Center.
The Job Underrun job monitor allows you to specify the actions Robot Schedule should take if a job completes too quickly.
Minimum duration:
Enter the minimum time (in hours and minutes) the job should run before completing.
Actions:
Specify where Robot Schedule should send a warning if the job completes faster than the time specified. You can select any combination, but must select at least one, of the following: the job’s message queue, a Robot Alert device, or Robot Network. Note: If you don’t specify an action, an event is logged in the job’s completion history and in the Job Monitor Events Log.
The Late Start job monitor allows you to specify the actions Robot Schedule should take if a job starts later than its scheduled run time. You can enter either the maximum amount of time (in hours and minutes) after its scheduled run time that the job can start, or the latest time by which the job must start.
Later than scheduled by:
Enter the maximum time (in hours and minutes) after its scheduled run time that the job can start.
or,
Must start by:
Select Must start by to monitor a job based on a time by which the job should start. Then, enter the time by which the job should have started.
Actions:
Specify the actions Robot Schedule should take if the job doesn’t start within the time specified. You can select to end the job, or send a warning to one, or any combination, of the following: the job’s message queue, a Robot Alert device, or the Robot Network Status Center.
You can use OPAL code to set up another type of Robot Schedule command-type job called a later-checker job. This type of job runs well after its critical job should have finished. The OPAL code in this job includes the RBASNDMSG command to send a message that the critical job is still running. You can run later-checker jobs at regular times, or only after an IPL, and you can use them to check the status of important subsystem and communication jobs. They help reduce the need for early morning physical checks around the data center. Because you are using OPAL code, you don’t need to keep track of the Robot Schedule job number, just the name.
For example, you could set up a later-checker job using the following command (where jobx is the name of your critical job):
RBTALRLIB/RBASNDMSG MSG('jobx STILL RUNNING') TOPG(SUPPORT)
The OPAL code for this job, shown below, uses the ACTJOB keyword to check the status of the critical job. If the job is not active, it is skipped.
Logic Operand |
Variable | Operation | Operation Values |
---|---|---|---|
IF | ACTJOB SKIP |
NE | jobx |
END |
This later-checker job runs every day, but sends a message only when jobx is running late.
The Good Morning Report in Robot Schedule, which summarizes job processing during a specific time period, is both a great source of information and a great tool to help you analyze and manage your schedule. To schedule the Good Morning Report in a Robot Schedule job, enter the RBTGM command in the Job Properties Command Entry window and press (or click) the Prompt button to display the command prompt panel. Or, select Job History Reports in the tree view and right-click Good Morning Report in the list view to display the report setup window.
The Good Morning report can include the following information:
To see the number of jobs that varied from the average runtime, enter a percentage of deviation. For example, if you enter 15, the Good Morning Report shows you the jobs that ran outside a 15 percent deviation of their average runtime.
To see the number of jobs that varied from a specific forecast, enter the forecast name and deviation. For example, if you enter a forecast name and 30, the Good Morning Report shows you the jobs that ran within 30 minutes of the specified forecast. To select from a list of forecasts, click the Prompt button next to the Forecast Name field.
Some users like to set up reactive jobs to notify them when a critical job completes normally. This approach has its pros and cons. On the plus side, it provides a “peace of mind” reminder each day. On the minus side, you are notified every day, even when things are going right.
You can set up a Robot Schedule reactive job that is triggered by the normal completion of a backup job or other critical job. The reactive job has Robot Alert send a message when the critical job completes. Because most jobs run and complete at fairly consistent times each day, you know the approximate time by which you should be notified. If you aren’t notified, you check the system.
A final way to check for late-running jobs is by using Robot Schedule with Robot Console resource monitoring. Resource monitoring lets you check on the availability of jobs, subsystems, job queues, objects, controllers, and so on. You also can check to see whether a batch job is running.
You can create a Robot Schedule job to run the Robot Console RBCCHKRSC command for any resource, at any time of the day. If the resource is not in the correct status, the Robot Schedule job fails, and Robot Alert notifies you. Robot Console also monitors QSYSOPR for inquiry messages or other critical messages that can affect your night processing. For example, with Robot Console and Robot Alert, you are notified within seconds if a night processing routine has a decimal data error, or if a file is full.
One of the biggest and often hardest-to-quantify benefits of automation is stress reduction. Without automation, you can spend a lot of time “fighting fires” and discussing what went wrong with night processing. With automation, monitoring your computer is automatic—you have a pulse on your computer operations, no matter where you are. Automating your systems can really help you deliver on your SLAs, while reducing your stress.
Still have questions? We can help. Submit a case to Technical Support.