cancel
Showing results for 
Search instead for 
Did you mean: 

Threshold condition duration

gunstonp
IR Partner

Threshold condition duration

I have a threshold which monitors all my HP Nonstop system processes and triggers when a process exceeds a certain memory consumption, with a check interval of 1 minute. So, looking for high memory users every minute. Every time an excessive memory user is found I get the alert. That condition might only be transient and not worth investigation. I only want to know if a process is using excessive memory for a long period of time. Is there a way to only trigger the alert if the condition has been in existence for, say, more than 10 minutes? 

5 REPLIES 5
SCOTT_BALDWIN
Expert

Re: Threshold condition duration

Hello @gunstonp,

 

The Timing tab of a Threshold Condition has some settings under the Log Frequency section that should meet this request.

 

With the Check Interval set to 1 minute, set the Log After value to 10 minutes.

• This means the Threshold Condition must be true for 10 consecutive 1 minute intervals before the alert will trigger.

 

Optionally, the Log at most every value can also be set to reduce the frequency of repeating alerts for this Condition.

 

Example of Timing tab with recommended changes:
Threshold Condition Timing tab.jpg

 

Thank you,
Scott Baldwin

 

gunstonp
IR Partner

Re: Threshold condition duration

Thanks very much Scott.

 

I tried as you suggested and it works almost as I would want. 

I previously played with the log frequency settings but got lost in the logic. I asumed that setting Log after 

10 consecutive occurrences would mean that the condition would have to occur 10 separate times before the threshold is triggered. I can see that it actually means, as you said, that the condition must be true for 10 consecutive <check interval> minute intervals. 

So, I ran an experiment with a tape device. I stopped the device and then waited for 8 minutes and the threshold was not triggered. So that was good. Then I restarted the device. Then I immediately stopped it again and about a minute later the threshold was triggered. So, it didn't wait for another consecutive 10 occurences, it just started from where it had left off before. 

I want the condition to only be reported on if it has existed for 10 minutes. In the case where I stopped the device for the second time the condition had only been true for about a minute.

Paul.

 

 

SCOTT_BALDWIN
Expert

Re: Threshold condition duration

Hello Paul (@gunstonp),

 

If the Log After field set to 10, with a 1 minute Check Interval, Prognosis should wait for 10 consecutive Where Clause "true" values (between 9 and 10 minutes) for the alert to trigger and would match the expectations of how the Threshold Condition should trigger.

 

This scenario might have caused Prognosis to miss the tape device being back in service for a short time after the 8 minutes of being down: Prognosis may not have done a check while the tape device was in service.


Examples:

Despite the Threshold Condition checking the record every minute, the back end might only update the record every 5 minutes.

  • If the update only happens every 5 minutes, and the tape devices was down for less than 5 minutes, but the tape device was up at the beginning of the 5 minutes, then down, then up again when the next check was done, the alert would still have triggered.
  • The same would happen if the check interval is short (maybe 60 seconds), but the tape device was down for only a very short time.

Another possible cause for the unexpected behavior might be the timing of Prognosis counting the down condition 10 times.

  • The first time the Where Clause is found "true", Prognosis is waiting for another 9 times. This could be as short as 9 minutes 1 second.
  • Potentially, the 8 minutes of the tape device being down could have crossed gone to just over 9 minutes, and the Threshold was in the process of sending an alert because the 10 consecutive checks criteria was met.
  • The Prognosis process to send alert typically takes between 30 seconds and 3 minutes. This scenario seems unlikely, but is a possibility.

 

Alternate test:

Do the same test, but only wait 5 minutes, and when the tape device is put back in service for a short time, verify Prognosis shows the status for the tape device as up.

  • In this scenario, the Threshold Log After counter should to reset to 0, and wait for 10 more minutes of the device being down before alerting.
  • If a different behavior is seen, please raise a Support case so further research specific to this environment can be done.

 

Thank you,
Scott Baldwin

 

gunstonp
IR Partner

Re: Threshold condition duration

Hi Scott

 

Thanks for your time on this.

 

I can see that I've hit on some timing issues with my test scenario and you've explained the reasons so I'm happy with that.

 

I ran the alternative test as you suggested...

I stopped my tape device, I see no threshold triggered and, running up a Display, I can see that Prognosis recognises the device went down at 09:52. After 5 minutes, still no threshold trigger as expected, I restarted the device and the Display shows the device is back up (after its 30 second refresh interval). In that time there was no threshold trigger which is all good.

As an extra test, I stopped the device again at 09:59. I wanted to see if the trigger happens at 10:02, so still counting 10 check intervals from the first down at 09:52. I'm happy to say that it didn't happen and in fact the trigger happened at 10:09. So that's just as I would want. 

 

Thanks for your help on this. I've now rolled this change out to my live environment and my users are happy!

 

Paul  

SCOTT_BALDWIN
Expert

Re: Threshold condition duration

Hi Paul,

 

Thank you for the detialed feedback! Great to hear the changes caused the alerting to work as desired!

 

To help anyone else viewing this post, (if you have time) please mark the answer post as a solution so the backend search will find this solution more quickly.

 

Thank you,
Scott Baldwin

 

Webinar: Keep the modern workforce connected

Unified Communications has always been an important part of companies' digital transformation efforts due to its ability to enable rich virtual collaboration and communication. But with COVID-19, we've reached a break-through point.

Join Bill Haskins, Sr. Analyst & Partner, Unified Communications at Wainhouse Research, and John Ruthven, CEO at IR discuss UC challenges companies are experiencing due to the COVOID-19 crisis.

Join webinar