I have a lot of thresholds setup related to syslog hits. Some of the events I am watching for generate ~ 5-10 of the same syslog messages in a minute period. On my thresholds, I tried increasing the "check interval" slightly and the "log as most every" field is set to 5 minutes but this doesn't seem to help - if 12 of the same syslog msg are received I get alerted 12x via email. My guess is this has something to do with the nature of how syslogs are captured by IR... but maybe I'm just missing something? My goal is to only get 1 alert total
Thank you for the query.
You might check if the Acknowledgement is ticked in the Timing tab, having this ticked should only trigger one alert and await the operator acknowledgement. If it still sends multiple alerts please log a support case and send the threshold for our analysis.
The other thing to ensure is that your threshold ont eh PrognosisEvent / Syslog record due tot he way it works is a timeframe.. otherwise you get this type of behavior. There are a couple records in Prognosis that the same data will repeatedly show up interval to interval and for those you need to use the timestamp field within the record to prevent multiple alerts to ensure you are loooking at a limited timeframe.
Though keep in mind your check interval in your timing and ensure they align as desired and sometimes requires a bit of tweakng.. we just used this approach with another customer on Syslog that has like 6000+ alerts recently and it solved the problem.
The Ack Required Method @David_Sun mentioned doesn't solve the root problem for many customers; if not create another series of problems to address especially when they don't want to have to acknowledge alerts and can result in situations being missed if not applied sparingly. So while it does change the behavior flow.. it from the sounds of it is not the behavior flow you are looking for..
Thank you for the replies. I think I get what you're discussing with timestamps but don't know what I should be changing within my thresholds to accomodate this?
Using MPEvent (PRGNEVT) which is where Windows Events, AIX & Linux Syslog natively go if Prognosis is onboard you utilize EventTime. Our offboard collector I believe uses Syslog (SYSLOG) record and you use the field Timestamp.
As Christopher pointed out, event type records can be a bit tricky to deal with when you're trying to monitor them using a threshold. It's largely becuase they're Deliver Differences records and there's a possibility of receiving multiple instances of the same alert in a single interval or over multiple intervals. To monitor these kinds of records, a better solution would be to use an analyst as it's capable of doing more complex tasks.
I've attached an example analyst rule that provides one possible way to reduce the volume of alerts being generated.
I'll try to explain what the rule is doing and hopefully you'll be able to make sense of it.
There are two global variables defined at the top. One is used as a flag to tell both rules if they actually need to do anything that check interval.
The other is to capture a timestamp that tells the second rule if it's been more than 10 minutes since the last trigger time, in which case it should reset the flag.
The first rule SlgKrnlEmerg checks the SYSLOG record every
minute 10 seconds using a where clause of:
TIMESTMP > ^CURRTIME - 1 Minutes AND FACILITY = 0 AND SEVERITY = 0
So any messages that have been logged within the last minute that are from the Kernel facility with a severity of emergency (these are defined in the field definition), would trigger the rule.
An analyst give you a bit more more functionality than a threshold, is it's able to sort based on a field you define.
In this case, I've used the TIMESTMP in 'DESCENDING' to get the oldest message.
An analyst is also able to apply a LIMIT to the number of rows it will process, which I've set to 1.
This will prevent the rule from generating multiple alerts for every message received, that meets the criteria in the where clause.
The second rule TenMinCheck uses the PNODES record. Every Prognosis node returns 1 row of data for this record, so using a where clause of "ALL" should ensure this rule triggers every interval.
We aren't interested in what that record contains, we just need it to trigger every time.
All this rule does is check if the timestamp contained in the _TrTime variable is older than 10 minutes and if the trigger flag has been set.
If it meets that criteria, it will switch the flag back off allowing the SlgKrnlEmerg rule to trigger again.
This is just an example of how you might be able to reduce the volume of alerts being generated.
The SlgKnlEmerg rule will dispatch an email to the SAMPLE DISPMAN profile and log a message to the PROBLEM log.
The TenMinCheck rule will log a message to the PROBLEM log to indicate it's switched the flag back.
To use the attached analyst rule:
- Log into the GUI.
- Click 'File>New>Analyst'.
- Give it a name e.g. SyslogExample.
- Click on the 'Analyst Rules' tab.
- Enter a name in 'Analyst' and a file name in 'Rules File' e.g. SlgExample.txt
The rules file will be created under \Prognosis\Server\Configuration if it doesn't exist.
- You can either copy the contents of the attached SlgExample.txt file and paste into the window on this tab, or copy the file into the directory mentioned on the line above.
- Click 'OK' and then 'Yes' to save a copy of the anlayst document in the GUI.
You can then start by double clicking or dragging and dropping on the node you want to start it on.
Getting the syntax right can be a bit of a challenge until you get familiar however, you can find a lot of useful information in the Product Guide if you want to give it a try.
Hope you find this helpful!
Unified Communications has always been an important part of companies' digital transformation efforts due to its ability to enable rich virtual collaboration and communication. But with COVID-19, we've reached a break-through point.
Join Bill Haskins, Sr. Analyst & Partner, Unified Communications at Wainhouse Research, and John Ruthven, CEO at IR discuss UC challenges companies are experiencing due to the COVOID-19 crisis.Join webinar