This repo contains the interferon gem. This gem enables you to store your alerts configuration in code. You should create your own repository, with a Gemfile
which imports the interferon gem. For an example of such a repository, along with example configuration and alerts files, see https://www.github.com/airbnb/alerts_example
This gem provides a single executable, called interferon
.
You are meant to invoke it like so:
$ bundle exec interferon --config /path/to/config_file
Additional options:
-h
, --help
-- prints out usage information-n
, --dry-run
-- runs interferon without making any changes to alerting destinationsThe configuration file is written in YAML.
It accepts the following parameters:
verbose_logging
-- whether to print more outputalerts_repo_path
-- the location to your alerts repo, containing your interferon DSL filesgroup_sources
-- a list of sources which can return groups of people to alerthost_sources
-- a list of sources which can read inventory systems and return lists of hosts to monitordestinations
-- a list of alerting providers, which can monitor metrics and dispatch alerts as specified in your alerts dsl filesFor more information, see config.example.yaml file in this repo.
This repo knows about four kinds of objects:
Host Sources
Datadog
Datadog is our only alerting destination at the moment. Datadog's alerting syntax rule are here: http://docs.datadoghq.com/api/#alerts Here's a chart explaining the datadog metric syntax (generated via asciiflow):
+---------+ alert condition +-------------------------------------------------+
| |
| +-----+ metric to alert on |
| | |
| | tags to slice the metric by +------+ |
| | | |
v v v v
|----------| |-------------------------||--------------------------| |---|
max(last_5m):avg:haproxy_count_by_status{role:<%= role %>,status:up} by {host} > 0
^ ^ ^ ^
| | | |
| | +----|------------------------------+ |
| | | math on the metric over all tags | |
| | |-----------------------------------| +------------------------------------+
| | | * max, min, avg, sum | |trigger a separate alert for each |
| + +-----------------------------------+ |different value of these tags the |
| +----|----------------------------------------------+ |entire `by {}` clause can be ommited|
| | the interval to look at; always starts with last_ | +------------------------------------+
| |---------------------------------------------------|
| | * 5m, 10m, 15m, 30m |
| | * 1h, 2h, 4h |
+ +---------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
| metric condition, can be one of: |
|-------------------------------------------------------------------------------------------------|
| * max: the metric gets this high at least once during the interval |
| * avg: the metric is this on average during the interval |
| * min: the metric is this small at least once during the interval |
| * change: the metric changes this much between a value N minutes ago and now (raw difference). |
| * pct_change: the metric changes this much between a value N minutes ago and now (percentage). |
+-------------------------------------------------------------------------------------------------+
Groups
Groups actually come from group_sources. We only have a single group source right now, which reads groups in YAML files from the filesystem. However, we would like to add additional group sources, such as LDAP-based ones.