I have an alert configured to check of a certain message and send me an email alert when that message appears.
When these message happens they will happen a lot (2-3 messages per second until a corrective action is taken.
What I would like to do is to configure my alert so it checks every 1 minute and only send me an alert every 10 minutes. There are probably thousands of messages in that time span, but for that particular message, I just need 1 email every 10 minutes.
How would I configure that?
Here is a test one that I thought I have configured to check every 5 minutes and to send me an alert every 30 minutes. It didn't work as it sent me thousands of emails.
Thank you for including the screen capture.
To set the Threshold Condition to check every 1 minutes and only alert every 10 minutes:
If the Threshold Condition still genterats thousands of emails, please provide a screen capture of the Where Clause tab for additional suggestions to reduce the number of emails.
It depends on the key identifier for the data. If same/similar message has a different identifier (Key value) then it will trigger a new alert for each new message (row) coming in. What record are you using, so that we can confirm if your desired behavior is achievable or not.
Hi @HenryH ,
For this record, EventTime + EventSequenceNum + EventNumber are key fields. So if the messages are coming with different EventTime (and/or any combination above) it will trigger its own ON Event and hence fire a alert/email.
The alternative would be to write an analyst that instead of looking at 1 row, looks at summary/combine result and only trigger one alert. Alternatively a new record that is based off the event text (instead of time) could be created to only trigger alert once.
We don't know Analyst yet so we'll skip that.
So can I define a new key and use that in my alert somehow?
You mentioned a new record. Are you saying create a new NonStopEventEMS copy?
I ran into the same issue with using a threshold so I wrote and analysts to do the suppressing. The below is code for an analsyt that looks for ACI.XPTCP events with event number 2004 and sends an event to $ACOL (operations is watching this event collector for items to act upon) when it detects the first event. it sends the event to the $ACOL. should the event get logged a second time, it will log a message to $ACOL indicating that messages are being supporessed. All other events that match are ignored until the 10 minute has passed. This rule will event count the messages it suppressed and if the messages continue to be logged, it will report the number of times the message were suppressed.
STRING AnalystName := "Test-Rules"
RULE Test-Rule-1 PRIMARY
WHERE event.ssidown = "ACI"
AND event.ssidname = "XPTCP"
AND event.eventnum = 2004
AND event.source = "$EMSDN"
REFRESH 10 SECOND
EVERY 10 SECOND
SET RuleName := "Test-Rule-1"
set hold_time := @event.time
IF hold_time > TS_Refused + (10 * 60 * 100 ) and ! ONLY LOG ONCE every 10 minutes
TC_Refused > 0 and ! there were suppressed events
TS_Refused > hold_time - (11 * 60 * 100 ) ! There were suppress messages within the last 16 minutes.
SET msgText := subst ( "ACI.XPTCP.3000 #2004 events were suppressed
@TC_Refused@ times in the last 10 minutes." )
LOG EMS 00202 EVENTNUMBER 02004 COLLECTOR $ACOL IMMEDIATELY PRIORITY CRITICAL
IF hold_time > TS_Refused + (10 * 60 * 100 ) !ONLY LOG ONCE every 10 minutes
LOG EMS 00201 EVENTNUMBER 02004 COLLECTOR $ACOL IMMEDIATELY PRIORITY CRITICAL
SET TS_Refused := @event.time
SET TC_Refused := 0
SET TC_Refused := TC_Refused + 1
IF TC_Refused = 1
SET msgText := subst ( "ACI.XPTCP.3000 #2004 events are being suppressed
for up to 10 minutes." )
LOG EMS 00202 EVENTNUMBER 02004 COLLECTOR $ACOL IMMEDIATELY
MSG 00201 "@event.ssidown@.@event.ssidname@.@event.ssidvers@ #@event.eventnum@
@event.time@ @event.source@->@event.text@ from (@event.name@(@event.user@))
- Generated by (@AnalystName@.@RuleName@)"
MSG 00202 "@MsgText@ - Generated by (@AnalystName@.@RuleName@)"
Thanks Leon for this great Analyst example. The other thing I do if someone wants to throttle alerts is doing something like the following in an Analyst.
This is from an ATM Monitoring customer.. and what it does is look at 50 ATMs at a time (never generating more than 50 alerts per interval and also presorting them).. but what it also does.. is it holds onto the alert for 15 minutes before it does anything intentionally to see if it self-closes before it actually executes any of the dispatching or recovery actions.
where state = "open" and ( category <> "Comms" or ( category = "Comms" and opentime < CurrentTime - 30 minutes )) !>******** Hancock mod END************ ! refresh 30 seconds name ( ATM ) ! Process 50 at a time to save hammering the system too much on restart sort asc opentime limit 50
Having looked at the planning phase in session one of this series, we will turn our focus to the readiness phase. The all important technical capabilities assessment, ensuring the network, endpoints and users are adequately prepared for the move.
Hear first hand from IR's Global Head of Information Systems and Technology, Jason Schwendinger, on how he has been tackling these issues.Join webinar